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A survey of techniques for recognizing 
parallel processable streams in 
computer programs* 

by C. V. RAMAMOORTHY and M. J. GONZALEZ 

The University of Texas 
Austin, Texas 


INTRODUCTION 

State-of-the-art advances — in particular, anticipated 
advances generated by LSI — have given fresh impetus 
to research in the area of parallel processing. The 
motives for parallel processing include the following: 

1. Real-time urgency. Parallel processing can 
increase the speed of computation beyond the 
limit imposed by technological limitations. 

2. Reduction of turnaround time of high priority 
jobs. 

3 Reduction of memory and time requirements 
for “housekeeping” chores. The simultaneous 
but properly interlocked operations of reading 
inputs into memory and error checking and 
editing can reduce the need for large inter- 
mediate storages or costly transfers between 
members in a storage hierarchy. 

4. An increase in simultaneous service to many 
users. In the field of the computer utility, for 
example, periods of peak demand are difficult to 
predict. The availability of spare processors 
enables an installation to minimize the effects 
of these peak periods. In addition, in the event 
of a system failure, faster computational speeds 
permit service to be provided to more users 
before the failure occurs. 


* This work was supported by NASA Grant NGR 44-012-144. 


5. Improved performance in a uniprocessor multi- 
programmed environment. Even in a unipro- 
cessor environment, parallel processable seg- 
ments of high priority jobs can be overlapped so 
that when one segment is waiting for I/O, the 
processor can be computing its companion 
segment. Thus an overall speed up in execution 
is achieved. 

With reference to a single program, the term “paral- 
lelism^ can be applied at several levels. Parallelism 
within a program can exist from the level of statements 
of procedural languages to the level of micro operations. 
Throughout this paper, discussion will be confined to 
the more general “task” parallelism. The term “task” 
(process) generally is intended to mean a self-contained 
portion of a computation which once initiated can be 
carried out to its completion without the need for 
additional inputs. Thus the term can be applied to a 
single statement or a group of statements. 

In contrast to the way the term “level” was used 
above, task parallelism can exist at several levels within 
a hierarchy of levels. The statements of the main 
program of a FORTRAN program, for example, are 
said to be tasks of the first level. The statements within 
a subroutine called by the main program would then 
be second level tasks. If this subroutine itself called 
another subroutine, then the statements within the 
latter subroutine would be of the third level, etc. Thus 
a sequentially organized program can be represented 
by a hierarchy of levels as shown in Figure 1. Each 


1 



2 


Fall Joint Computer Conference, 1969 


LEVEL 1 LEVEL 2 LEVEL 3 LEVEL n 



Figure 1 Hierarchical representation of a sequentially 
organized program 


block within a level represents a single task; as before, 
a task can represent a statement or a group of state- 
ments. 

Once a sequentially organized program is resolved 
into its various levels, a fundamental consideration of 
parallel processing becomes prominent — namely that 
of recognizing tasks within individual levels which can 
be executed in parallel. Assuming the existence of a 
system which can process independent tasks in parallel, 
this problem can be approached from two directions. 
The first approach provides the programmer with 
additional tools which enable him to explicitly indicate 
the parallel processable tasks. If it is decided to make 
this indication independent of the programmer, then 
it is necessary to recognize the parallel processable 
tasks implicitly by analysis of the relationship between 
tasks within the source program. 

After the information is obtained by either of these 
approaches, it must still be communicated to and 
utilized by the operating system. At this point, efficient 
resource utilization becomes the prime consideration. 

The conditions which determine whether or not two 
tasks can be executed in parallel have been investi- 
gated by Bernstein. 1 Consider several tasks, T if of a 
sequentially organized program illustrated by a flow 
chart as shown in Figure 2(a). If the execution of 



(a) (b) (c) 

Figure 2 — Sequential and parallel execution of a 
computational process 


task T 3 is independent of whether tasks T x and T 2 are 
executed sequentially as shown in Figure 2(a) or 2(b), 
then parallelism is said to exist between tasks Ti and 
T 2 . They can, therefore, be executed in parallel as 
shown in Figure 2(c). 

This “commutativity” is a necessary but not suffi- 
cient condition for parallel processing. There may exist, 
for instance, two processes which can be executed in 
either order but not in parallel. For example., the in- 
verse of a matrix A can be obtained in either of the 
two ways shown below. 


( 1 ) 

a) Obtain transpose of A 

b) Obtain matrix of co- 
factors of the transposed 
matrix 

c) Divide result by 
determinant of A 


( 2 ) 

a) Obtain matrix of 
cofactors of A 

b) Transposes matrix 
of cofactors 

c) Divide result by 
determinant of A 


Thus obtaining the matrix of cofactors and the trans- 
position operation are two distinct processes which can 
be executed in alternate order with the same result. 
They cannot, however, be executed in parallel. 

Other complications may arise due to hardware 
limitations. Two tasks, for example, may need to access 
the same memory. In this and similar situations, 
requests for service must be queued. Djkstra, Knuth, 
and Coffman 2 - 8 ' 4 have developed efficient scheduling 
procedures for using common resources. 

In terms of sets representing memory locations, 
Bernstein has developed the conditions which must be 
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satisfied before sequentially organized processes can be 
executed in parallel. These are based on four separate 
ways in which a sequence of instructions can use a 
memory location: 

(1) The location is only fetched during the execution 
ofT<. 

(2) The location is only stored during the execution 
ofT<. 

(3) The first operation within a task involves a fetch 
with respect to a location; one of the succeeding oper- 
ations of T i stores in this location. 

(4) The first operation within a task involves a store 
with respect to a location; one of the succeeding oper- 
ations of Ti fetches this location. 


individual functional units can be assigned to inde- 
pendent components within a task. The motivation 
remains the same — a decrease in execution time of 
individual tasks. The CDC 6600, for example, can 
utilize several arithmetic units to perform several 
operations simultaneously. This type of parallelism can 
be illustrated by the arithmetic expression which 
follows. 

X= (A+B)*(C-D) 

Normally, this expression would be evaluated in a 
manner similar to that shown in Figure 3(a). The 
independent components within the expression, how- 
ever, permit parallel execution as shown in Figure 
3(b) with the same results. 


Assuming a machine model in which processors are 
allowed to communicate directly with the memory 
and multi-access operations are permitted, the con- 
ditions for strictly parallel execution of two tasks or 
program blocks can be stated as follows. 

(1) The areas of memory which Task 1 “reads” 
and onto which Task 2 “writes” should be mutually 
exclusive, and vice-versa. 

(2) With respect to the next task in a sequential 
process, Tasks 1 and 2 should not store information in 
a common location. 

The conditions listed by Bernstein are sufficient to 
guarantee commutativity and parallelism of two 
program blocks. He has shown, however, that there do 
not exist algorithms for deciding the commutativity or 
parallelism of arbitrary program blocks. 

As an example of what has been discussed here 
consider the tasks shown below which represent FOR- 
TRAN statements for evaluation of three arithmetic 
expressions. 

X = (A+B) * (A — B) 

Y = (C-D)/(C+D) 

Z = X+Y 

Because the execution of the third expression is inde- 
pendent of the order in w r hich the first two expressions 
are executed, the first two expressions can be executed 
in parallel. 

Parallelism within a task can also exist wdien indi- 
vidual components of compound tasks can be executed 
concurrently. In the same manner that individual 
processors can be assigned to independent tasks, 


Explicit and implicit parallelsim 

In the explicit approach to parallelism, the program- 
mer himself indicates the tasks within a computational 
process which can be executed in parallel. This is 
normally done by means of additional instructions in 
the programming language. This approach can be 
illustrated by the techniques described by Conway, 
Opler, Gosden, and others 6 ,6,7 . FORK in the FORK 
and JOIN technique 6 indicates thep arallel process- 
ability of a specified set of tasks-, within a process. The 
next sequence of tasks will not be initiated until all 



(a) (b) 

Figure 3 — Illustration of parallelism within a compound 
task 
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the tasks emanating from a FORK converge to a 
JOIN statement. 

In some instances, some of the parallel operations 
initiated by the FORK instruction do not have to be 
completed before processing can continue. For example, 
one of these branch operations may be designed to 
alert an I/O unit to the fact that it is to be used mo- 
mentarily. The conventional FORK must be modified 
to take care of these situations. Execution of an IDLE 



I 

I 


Figure 4 — FORK and JOIN technique 


statement, for example, permits processors to be 
released without initiation of further action. 7 The 
FORK and JOIN TECHNIQUE is illustrated in 
Figure 4. 

Another example of the explicit approach is the 
PARALLEL FOR 7 which takes advantage of parallel 
operations generated by the FOR statement in ALGOL 
and similar constructs in other languages. For example, 
the sum of two n X n matrices consists essentially of 
n 2 independent operations. If n processors were availa- 
ble, the addition process could be organized such that 
entire rows or columns could be added simultaneously. 
Thus the addition of the two matrices could be accom- 
plished in n units of time. Another example of this 
approach is the programming language PL/1 which 
provides the TASK option with the CALL statement 
which indicates concurrent execution of parallel 
tasks. 

An additional way of indicating parallelism explicitly 
is to write a language which exploits the parallelism in 
algorithms to be implemented by the operating system. 
This is the case with TRANQUIL, 8 * 21 an ALGOL- 
like language to be utilized by the array processors of 
the ILLIAC IV. The situation is unique in that the 
language was created after a system was devised to 
solve an existing problem. “The task of compiling a 
language for the ILLIAC IV is more difficult than 
compiling for conventional machines simply because of 
the different hardware organization and the need to 
utilize its parallelism efficiently.” A limitation of this 
approach is that programs written in that particular 
language can only be run on array-type computers and 
is, therefore, heavily machine dependent. 

The implicit approach to parallelism does not depend 
on the programmer for determination of inherent 
parallelism but relies instead on indicators existing 
within the program itself. In contrast to the relative 
ease of implementation of explicit parallelism, the 
implicit approach is associated with complex compiling 
and supervisory programs. 

The detection of inherent parallelism between a set 
of tasks depends on thorough analysis of the source 
program using Bernstein's conditions. Implementation 
of a recognition scheme to accomplish this detection 
is dependent on the source language. Thus a recognizer 
which is universally applicable cannot be implemented. 

An algorithm developed by Fisher 9 approaches the 
problem of parallel task detection in a general manner. 
His algorithm utilizes the input and output sets of 
each task (process) to determine essential ordering 
and thus inherent parallelism. Given such information 
as the number of processes to be analyzed, the input 
and output set for each process, the given permissible 
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ordering among the processes, and any initially known 
essential order among the processes, the algorithm 
generates the essential serial ordering relation and the 
covering for the essential serial ordering relation. This 
covering provides an indication of the tasks within the 
overall process which can be executed concurrently. 

Basically, this work formalizes in the form of an 
algorithm the conditions for parallel processing devel- 
oped by Bernstein. The conditions for parallel processing 
between two tasks are extended to an overall process 

Detection of task parallelism — A new approach 

The next subject covered in this paper involves 
implicit detection of parallel processable tasks within 
programs prepared, for serial execution. An indication 
is desired of the tasks which can be executed in parallel 
and the tasks which must be completed before the 
start of the next sequence of tasks. Thus the problem 
can be broken down in two parts — recognizing the 
relationships between tasks within a level and using 
this information to indicate the ordering between tasks. 

The approach presented here is based on the fact 
that computational processes can be modeled by 
oriented graphs in which the vertices (nodes) represent 
single tasks and the oriented edges (directed branches) 
represent the permissible transition to the next task 
in sequence. The graph (and thus the computational 
process) can be represented in a computer by means 
of a Connectivity Matrix, C. 10 ' 11 C is of dimension 
n X n such that C<y is a “1” if and onty if there is a 
directed edge from node i to node j, and it is “0” 
otherwise. The properties of the directed graph and 
hence of the computational process it represents can 
be studied by simple manipulations of the connectivity 
matrix. 

A graph consisting of a set of vertices is said to be 
strongly connected if and only if any node in it is reach- 
able from any other. A subgraph of any graph is defined 
as consisting of a subset of vertices with all the edges 
between them retained. A maximal strongly connected 
(M.S.C.) subgraph is a strongly connected subgraph 
that includes all possible nodes which are strongly 
connected with each other. Given a connectivity matrix 
of a graph, all its M.S.C. subgraphs can be determined 
simply by well-known methods. 10 A given program 
graph can be reduced by replacing each of its M.S.C. 
subgraphs by a single vertex and retaining the edges 
connected between these vertices and others. After 
the reduction, the reduced graph will not contain any 
strongly connected components. 

The paragraphs which follow will describe the se- 
quence of operations needed to prepare for parallel 


processing in a multiprocessor computer a program 
written for a uniprocessor machine. 

(1) The first step is to derive the program graph 
which identifies the sequence in which the computation 
al tasks are performed in the sequentially code- 
program. Figure 5(a) illustrates an example program 
graph. The program graph is represented in the com- 
puter by its connectivity matrix. The connectivity 
matrix for the example is given in Figure 5(b). 

(2) By an analysis of the connectivity matrix, the 
maximal strongly connected subgraphs are determined 
by simple operations. 10 This type of subgraph is il- 
lustrated by tasks 2 and 12 in Figure 5. Each M.S.C. 
subgraph is next considered as a single task, and the 
graph, called the reduced graph, is derived. The re- 
duced graph does not contain any loops or strongly 
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Figure 5 — Program graph of a serially coded program 
and its connectivity matrix 
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connected elements. In this graph) when two or more 
edges emanate from a vertex, a conditional branching 
is indicated. That is, the execution sequence will take 
only one of the indicated alternatives. A vertex which 
initiates the branching operation will be called a 
decision or branch vertex. The reduced graph for the 
example program graph is shown in Figure 6. In this 
graph, vertex 3 represents a branch vertex. 

(3) The next step is to derive the final program 
graph and its connectivity matrix T. The elements of 
T are obtained by analyzing the inputs of each vertex 
in the reduced graph. An element, T*y, is a “1” if 
and only if the j-th task (vertex) of the reduced graph 
has as one of its inputs the output of task i; otherwise 
T»y is a “0”. Figure 7 illustrates the final program for 
the example after consideration is given to the input- 
output relationships of each task. The connectivity 
matrix for the final program graph is shown in F'gure 8. 

From the sufficiency conditions for task parallelism, 
two tasks can be executed in parallel if the input set of 
one task does not depend on the output set of the other 
and vice versa. The technique outlined in Step 4 detects 
this relationship and uses it to provide an ordering 
for task execution. 

(4) The vertices of the final program graph are 



Figure 6 — Reduced program graph of the serially coded 
program 



Precedence [l] , {2} , {3,8} , [4,5,9,10] 
Partitions 

{6,11,12}, {7,131, {141 


Figure 8 —Connectivity matrix of the final program 
graph 


partitioned into “precedence partitions” 11 as follows. 
Using the connectivity matrix T, a column (or columns) 
containing only zeroes is located. Let this column 
correspond to vertex Vi. Next delete from T both the 
column and the row corresponding to this vertex. The 
first precedence partiton is Pi = {vi}. Using the re- 
maining portion of T, locate vertices {v 2 i, v 22r . .} which 
correspond to columns containing only zeroes. The 
second precedence partition P 2 thus contains vertices 
{v 2 i, v 22 ,. This implies that tasks in set P 2 = 
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{V21, V22,. • •} can be initiated and executed in parallel 
after the tasks in the previous partition (i.e., Pi) have 
been completed. Next delete from T the columns and 
rows corresponding to vertices in P 2 . This procedure is 
repeated to obtain precedence partitions P 3 ,P 4 ,* • *P 
until no more columns or rows remain in the T matrix. 
It can be shown that this partitioning procedure is 
valid for connectivity matrices of graphs which contain 
no strongly connected components. 

The implication of this precedence partitioning is 
that if P ijP 2 j. . .P p corresponds to times ti,t 2 ,. . -t p , the 
earliest time that a task in partition P* can be initiated 
is t». 

The final program graph contains the following types 
of vertices: (1) The branch or decision type vertex 
from which the execution sequence selects a task from 
a set of alternative tasks. (2) The Fork vertex which 
can initiate a set of parallel tasks. (3) The Join vertex 
to which a set of parallel tasks converge after their 
execution. (4) The normal vertex which receives its 
input set from the outputs of preceding tasks. Figure 7a 
indicates the final program graph with the first three 
types of vertices indicated by B, F, and J, respectively. 

(5) From precedence partitioning and the final 
program graph, a Task Scheduling Table can be 
developed. This table, shown in Table I, serves as an 
input to the operating system to help in the scheduling 
of tasks. For example, if the task being executed is a 
Fork task, a look-ahead feature of the system can 
prepare for parallel execution of the tasks to be ini- 
tated upon completion of the currently active task. 

(6) The precedence partitions of Step 4 provide an 
indication of the earliest time at which a task may be 
initiated. It is also desirable, however, to provide an 
indication of the latest time at which a task may be 
initiated. This information can be obtained by per- 
forming precedence partitions on the transpose of the 
T matrix. This process can be referred to as “row par- 
titions”. The implication here is that if task is in the 
partition corresponding to time period t k) then t* is 
the latest time that the task i can be initiated. 

Using both the row and column partitions, the per- 
missible initiation time for each task can be derived as 
shown in Table II. Task 4, for example, can be in- 
itiated during t 4 or t 6 depending on the availability of 
processors. 

At this point it is desirable to clarify some possible 
misinterpretations of the implications of this method. 
The method presented here does not try to determine 
whether any or all of the iterations within a loop can 
be executed simultaneously. Rather the iterations 
executed sequentially are considered as a single task. 


TABLE I — Task scheduling table 


TIME 

INPUTS 
TO TASKS 

TASK 

NUMBER 

TASK 

TYPE 

l i 

- 

l 



1 

2 

FORK 

l 3 

2 

3 

BRANCH 

s 

2 

8 

FORK 

l 4 

3 

4 


l 4 

3 

5 


l 4 

8 

9 

FORK 

l 4 

8 

10 


l 5 

5 

6 


l 5 

9 

11 



9 

12 



4, 6 



7 

JOIN 

*6 

10,11,12 

13 

JOIN 

t 7 

7,13 

14 

JOIN 


For this reason, the undecidability problem introduced 
by Bernstein is not a factor here. 

In addition, precedence partitions may place the 
successors of a conditional within the same partition. 
The interpretation of this is that only one of the suc- 
cessors will be executed, and it can be executed in 
parallel with the other tasks within that partition. 

The FORTRAN parallel task recognizer 

In order to determine the degree of applicability of 
the method described above, it was decided to apply 
the method to a sample FORTRAN program. This 
was accomplished by writing a program whose input 
consists of a FORTRAN source program; its output 
consists of a listing of the tasks within the first level 
of the source program which can be executed in parallel. 
The program written to accomplish this parallel task 
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TABLE II — Permissible task: initiation time termination of the first computation would have to 

precede the initiation of the second. 


COLUMN PARTITIONS j 

PERMISSIBLE TASK 

TIME 

TASK 

INITIATION PERIODS 

H 

1 

TASK 

TIME 

t 

2 

1 

l i 

l 3 

3,8 

2 

l 2 


4,5,9,10 

3: 

l 3 


6,11,12 

4 

V's 

l 6 

7,13 

5 : 

l 4 


14 

6 

l 5 

ROW PARTI 

[TIONS 

7 



1 

8 

*3 


2 

9 

l 4 

S 

3,8 

10 

V's 

l 4 

5,9 

11 

l 5 

l S 

4,6,10,11,12 

12 

*5 

l 6 

7,13 

13 

l 6 

l 7 

14 

M 

*7 


detection is known in its final form as a FORTRAN 
Parallel Task Recognizer. 13 

The recognizer, also written ih FORTRAN, relies 
on indicators generated by the| way in which the 
program is actually written. Consider the expressions 
given below. 

XI = fi(A,B) 

X2 = f 2 (C,D) 

Because the right-hand side of the second expression 
does not contain a parameter generated by the compu- 
tation which immediately precedes it, the two expres- 
sions can be executed in parallel. If, on the other hand, 
the expressions were rewritten sis shown below, the 


XI = fi(A,B) 

X2 = f 2 (Xl,C) 

The recognizer performs this determination by com- 
paring the parameters on the right-hand of the equality 
sign to outcomes generated by previous statements. 

Other FORTRAN instructions can be analyzed 
similarly. Consider the arithmetic IF: 

IF (X - Y) 3,4,5 

Here the parameters within the parentheses must be 
compared to the outputs of preceding statements in 
order to determine essential order. 

Other FORTRAN instructions are analyzed in a 
similar manner in order to generate the connectivity 
matrix for the source program. During this analysis 
the recognizer assigns numbers to the executable 
statements of the source program. After this is com- 
pleted, the recognizer proceeds with the method of 
precedence partitions described earlier. Precedence 
partitions yield a list of blocks which contain the state- 
ment numbers which can be executed concurrently, 

Figure 9 shows a block diagram of the steps taken by 
the recognizer to generate the parallel processable 
tasks within the first level of a FORTRAN source 
program. 

Some statements within the FORTRAN set are 
treated somewhat differently. The DO statement, for 
example, does not itself contain any input or output 
parameters but instead generates a series of repeated 
operations. Because of the loop considerations men- 
tioned earlier, and because the rules of FORTRAN 
require entrance into a loop only through the DO 
statement, all the statements contained within a DO 
loop are considered as a single task. A loop, however, 
may contain a large number of statements, and a great 
amount of potential parallelism may be lost if con- 
sideration is not given to the statements within the 
loop. For this reason, the recognizer generates a sepa- 
rate connectivity matrix for each DO loop within the 
program. 

The recognizer itself possesses limitations which 
must be eliminated before it can be applied to programs 
of a complex nature. For example, only a subset of 
the entire FORTRAN set is considered for recogniton. 
This could be corrected by expanding the recognition 
process to include a more complete set of instructions. 

In addition to the DO statement, loops can also be 
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SCAN EXECUTABLE 
STATEMENTS AND 
COMPARE INPUT 
PARAMETERS TO 
OUTPUTS CFFREMKXTS 
STATEMENTS 


Q START ^ 




READ NEXT 
SOURCE 
PROGRAM 
INSTRUCTION 



IF THIS TASK IS THE 
SUCCESSOR OF A 
BRANCH OR TRANSFER! 
OPERATION, RECORD 
THIS INFORMATION 1 


ASSIGN A 
STATEMENT 
NUMBER 
TO THIS 

Lie ruction 



RECORD INPUT 
*ND OUTPUT 
PARAMETERS 
w REQUIRED BY 
THIS TASK 


Nvhen MATCH IS 
FOUND ,M ACE ENTRY 
IN C,Le. , SHOW A 
CONNECTION FROM 
PREDECESSOR TO 
SUCCESSOR 


AFTER GENERATION 
OF CIS COMPLETE, 
GENERATE 
PRECEDENCE 
PARTITIONS 


USING THE ASSIGNED 
STATEMENT NUMBERS 
INDICATE THOSE 
TASKS WITHIN THE 
FIRST LE\EL WHICH 
CAN BE DONE IN 
PARALLEL 


^ END ^ 


Figure 9 — Block diagram of the FORTRAN 
parallel task recognizer 


created by branch and transfer operations such as 
the IF and GO TO instructions. To eliminate these 
loops, it would be necessary to analyze the connectiv- 
ity matrix in the manner mentioned earlier before 
beginning the process of precedence partitions. The 
recognizer does not presently perform this analysis. 

Nested DO loops are not permitted, and the source 
program size is limited in the number of executable 
statements it may have and in the number of param- 
eters any one statement can contain. 

Some of these limitations could be eliminated quite 
easily; others would require a considerable amount of 
effort. To allow a source program of arbitrary size 
would require a somewhat more elaborate handling of 
memory requirements and associated problems. At the 


C THIS IS A TEST PROGRAM DESIGNED TO CHECK PPS 

DIMENSION A1(10),A2(10),A3(10) 

INTEGER A1 ,A2 ,ABC ,A2,X2,B ,C ,D 

1 READ 100, (A1(I),I=1,10),B,C,D 

2 READ 100, (A2 (I) , 1=1 ,10) , NS , NST, NSTU 

3 DO 10 1=1,10 

4 IF(A1(I)-A2(I))20 ,30,40 

5 20 X1=(A1(I))*(B-C) 

6 X2=D+(B/C) 

7 A3(I)=X1*X2 

8 10 CONTINUE 

C THIS IS A TEST COMMENT 

9 30 PRINT 200 ,B,C ,D 

10 40 CALL ALPHA(A1,A2, ABC, B4,B5) 

11 PRINT 3057 ,X1 ,X2 , (A3 (I) ,1=1 ,10) 

12 CALL BETA(X1 ,X2 , A3 , B6) 

13 IF(B4-B5)50 ,50,60 

14 50 READ 315,E,F,G,H 

15 X3=(E*F)+(G-H) 


16 X4=B6+G PARALLEL 

17 X5=X3-X4 PROCESSABLE 

18 X6=(B4+B5)*X5 TASKS 

19 60 PRINT 4,X3,X4,X5 (1,2) 

20 PRINT 52 , (A1 (I) ,1=1 ,10) ,ABC ,C , (A 3(1) ,1=1 ,10) (3) 

100 FORM AT(10I2, 313) (9,10,11,12) 

200 FORMAT(1HO ,8 B C D*,/,3I3) (13) 

3057 FORMAT(lH ,2I3,10F7.1) (14) 

315 FORMAT(4F7 . 4) (15,16) 

4 FORMAT(3F7 . 4) (17) 

52 FORMAT(12I3 , 10F7 . 1) (18,19,20) 

21 END 

(a) (b) 


Figure 10 — An example of the recognition process. 


present time the recognizer consists of a main program 
and six subroutines. In its present form the recognizer 
consists of approximately 1300 statements. 

The recognizer is presently written in such a manner 
that it will detect only first level parallelism. The 
method it uses, however, can be applied to parallelism 
at any level. 

The theory of operation of the FORTRAN parallel 
task recognizer will be illustrated by applying the 
recognition techniques to a sample FORTRAN program. 
Figure 10(a) is a listing of the sample program showing 
the individual tasks. Figure 10(b) is a listing of the 
parallel processable tasks as determined by precedence 
partitions. The numbers to the left of the executable 
statements are the numbers assigned by the recognizer 
during the recognition phase. 

Elimination of the limitations mentioned here and 
other limitations not mentioned explicitly will be the 
subject of future effort. 

Observations and comments 

Regardless of the manner in which the subject of 
parallel processing is approached, common problems 
arise. Prominent among these is a need to protect 
common data. If two tasks are considered for con- 
current execution and one task accesses a memory 
location and the other amends it, then strict observance 
must be paid to the order in which this is done. The 









10 


Fall Joint Computer Conference, 1969 


FORTRAN recognizer, for example, may determine 
that two subroutines can be executed in parallel. At 
the present time no consideration is given to the fact 
that both subroutines may access common data 
through COMMON or EQUIVALENCE statements. 

In order to truly optimize execution time for a 
program which is set up for parallel processing, it 
would be highly desirable to determine the time re- 
quired for execution of the individual tasks within 
the process. It is not enough to merely determine that 
two tasks can be executed concurrently; the primary 
goal is that this parallel execution result in higher 
resource utilization and improved throughput. If the 
time required for the execution of one task is 100 times 
that of the other, for example, then it may be desirable 
to execute the two tasks serially rather than in parallel. 
The reasoning here is that no time would bo spent 
in allocating processors and so forth. 

Determination of task execution time, however, is 
not a simple matter. Exhaustive measurements of the 
type suggested by Russell and Estrin 14 would provide 
the type of information mentioned here. 

Another problem area involves implementation of 
special purpose languages such as TRANQUIL. It 
was mentioned earlier that programs written in a 
language of this type are highly machine-limited. It 
would be highly desirable to be able to implement 
programs written in these languages in systems which 
are not designed to take advantage of parallelism. 
Along these lines, the programming generality sug- 
gested by Dennis 16 may be significant. 

It should be pointed out that all the techniques 
which have been discussed here will create a certain 
amount of overhead. For this reason it is felt that a 
parallel task recognizer, for example, would be best 
suited for implementation with production programs. 
Thus even though some time would be lost initially, 
in the long run parallel processing would result in a 
significant net gain. 

Conclusions 

The method of indicating parallel processable tasks 
introduced here and illustrated in part by the FOR- 
TRAN Parallel Recognizer appears to provide enough 
generality that it is independent of the language, the 
application, the mode of compilation, and the number 
of processors in the system. It is anticipated that this 
method will remain as the basis for further effort in 
this area. 

In addition to the comments made earlier, some 
possible future areas of effort include determination of 


possible parallelism of individual iterations within a 
loop. It is hoped that additional information can be 
provided to the operating system other than a mere 
indication of the tasks which can be executed in paral- 
lel. This would include the measurements mentioned 
earlier and an indication of the frequency of execution 
of individual tasks. 

It is also hoped that a sub-language may be de- 
veloped which can be added to existing languages to 
assist in the recognition process and the development 
of recognizer code. 

Detection of parallel components within 
compound tasks 

Several algorithms exist for the detection of inde- 
pendent components within compound tasks, 16 ’ 17 ’ 16 ’ 19 
These algorithms are concerned primarily with de- 
tection of this type of parallelism within arithmetic 
expressions. The first three algorithms referenced 
above are summarized in [19] where a new algorithm 
is also introduced. 

The arithmetic expression which will be used as an 
example for each algorithm is given below. 

A + R + C + D*E*F + G + H 

Throughout this discussion the usual precedence 
between operators will apply. In order of increasing 
precedence, the precedence between operators will be 
as follows: + and — , * and/, and I s , where T stands 
for exponentiation. 

Hellerman’s algorithm 

This algorithm assumes that the input string is 
written in reverse Polish notation and contains only 
binary operators. The string is scanned from left to 
right replacing by temporary results each occurrence 
of adjacent operands immediately followed by an 
operator. These temporary results will be considered 
as operands during the next passes. Temporary results 
generated during a given pass are said to be at the 
same level and therefore can be executed in parallel. 
There will be as many passes as there are levels in the 
syntactic tree. The compilation of the expression 
listed above is shown in Figure 11. 

Although this algortihm is simple and fast, it has 
two shortcomings. The first is a possible difficulty in 
implementation since it requires the input string to 
be in Polish notation; the second is its inability to 
handle operators which are not commutative. 
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l INPUT STRING AFTER THE tth PASS 

0 AB+C+DE*F*+G+H+ 

1 RI C+R2 F*+G+H+ 

2 R3 R4+G+H+ 

3 R5 G+H+ 

4 R6 H+ 

5 R7 


TEMPORARY RESULTS 
GENERATED DURING ith PASS 


R1=A+B 

R2=D*E 


R3=R1+C 

R4=R2*F 


R5=R3+R4 


R6=R5+G 

R7=R6+H 


LEVEL 

5 

4 

3 

2 

I 

0 



Figure 11 — Parallel computation of 
A+B+C+D*E*F+G+H using Hellerman’s 
algorithm 


Stone’s algorithm 

The basic function of this algorithm is to combine 
two subtrees of the same level into a level that is one 
higher. For example, A and B, initially of level 0, are 
combined to form a subtree of level 1. The algorithm 
then searches for another subtree of level 1 by attempt- 
ing to combine C and D. Since precedence relation- 
ships between operators prohibit this combination, the 
level of subtree (A+B) is incremented by one. The 
algorithm now searches for a subtree of level 2 by 
attempting to combine C, D, and E. Since this com- 
bination is also prohibited, subtree (A+B) is incre- 
mented to level 3. The next search is successful, and a 
subtree of level 3 is obtained by combining C, D, E 
and F. These two subtrees are then combined to form a 
single subtree of level 4. 

In a similar manner the subtree (G+H), originally 
of level 1, is successively incremented until it achieves 
a level of 4; at that time it is combined with the other 
subtree of the same level to form a final tree of level 5. 

The algorithm yields an output string in reverse 
Polish which does not expressly show which operations 
can be performed in parallel. Even though the output 
string is generated in one pass, the recursiveness of 


the algorithm causes it to be slow, and at least one 
additional pass would be required to specify parallel 
computations. 

Squire's algorithm 

The goal of this algorithm is to form quintuples of 
temporary results of the form : 

Ri (operand 1, operator, operand 2, start level 
= max [end level op. 1; end level op. 2], end level = 
start level+1). 

All temporary results which have the same start level 
can be computed in parallel. Initially, all variables 
have a start and end level equal to zero. 

Scanning begins with the rightmost operator of the 
input string and proceeds from right to left until an 
operator is found whose priority is lower than that of 
the previously scanned operator. In the example th*> 
scan would yield the following substring : 

D*E*F+G+H 

Now a left to right scan proceeds until an operator is 
found whose priority is lower than that of the left- 
most operator of the substring. This yields: D*E*F. 
At this point a temporary result Rl is available of the 
form: 


R1(D,*,E,0,1). 

The temporary result, Rl, replaces one of the operands 
and the other is deleted together with its left operator 
The new substring is then : 

R1*F + G+H. 

The left to right scans are repeated until no further 
qunituple can be produced, and at that time, the right 
to left scan is re-initiated. The results of the process 
are shown in Figure 12. 

Although the example shows the algorithm applied 
to an expression containing only binary operators, the 
algorithm can also handle subtraction and division 
with a corresponding increase in complexity. 

A significant feature of this algorithm is that Polish 
notation plays no part in either the input string or 
the output quintuples. Because of the many scans and 
comparisons the algorithm requires, it becomes more 
complex as the length of the expression and the di- 
versity of operators wdthin the expression increase. 
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INITIAL STRING: A+B+C+D*E*F+ G+H 
RIGHT TO LEFT SCAN LEFT TO RIGHT SCAN 

D*E*F+G+H RI*F+G+H 

R2+G+H 

A+B+C+R2+G+H R3+C+R2+G+H 

R4+R3+R2+H 

R4+R5+R2 

R6+R2 

R7 


QUINTUPLES 

Op.l 

OPERATpR 

Op. 2 

START 

END 

Ri 

D 

* 

E 

0 

1 

R2 

F 

* 

Ri 

1 

2 

R3 

A 

+ 

B 

0 

1 

R4 

C 


G 

0 

1 

R5 

H 

+ 

R3 

L 

2 

R6 

R4 

+ 

rts 

2 

3 

R7 

LEVEL 

R2 

+ 

R6 

3 

4 



Figure 12 — Parallel computation of 
A+B+C+D*E*F+G+H using Squire’s algorithm 

Baer and Bovet’s algorithm 

The algorithm uses multiple passes. To each pass 
corresponds a level. All temporary results which can 
be generated at that level are constructed and inserted 
appropriately in the output string produced by the 
corresponding pass. Then, this output string becomes 
the input string for the next level until the whole 
expression has been compiled. Thus the number of 
passes will be equal to the number of levels in the 
syntactic tree. During a pass the scanning proceeds 
from left to right and each operator and operand is 
scanned only once. 

The simple intermediate language which this al- 
gorithm produces is the most appropriate for multi- 
processor compilation in that it shows directly all 
operations which can be performed in parallel, namely 
those having the same level number. The syntactic 
tree generated by this algorithm is shown in Figure 
13. 

A new algorithm 

This section will introduce a technique whose goals 
are: (1) to produce a binary tree which illustrates the 
parallelism inherent in an arithmetic expression; and 


LEVEL 



Figure 13 — Parallel computation of 

A+B+C+D*E*F+G+H using Baer and 

Bo vet’s algorithm 

(2) to determine the number of registers needed to 
evaluate large arithmetic or Boolean expressions with- 
out intermediate transfers to main memory. 

This technique is prompted by the fact that existing 
computing systems possess multiple arithmetic units 
which can contain a large number of active storages 
(registers). In addition, the superior memory band- 
widths of the next generation of computers will simplify 
some of the requirements of this technique. 

In the material presented below, a complex arithmet- 
ic expression- is examined to determine its maximum 
computational parallelism. This is accomplished by 
repeated rearrangement of the given expression. During 
this process the given expression in reverse Polish form 
is also tested for “well formation”, i.e., errors and 
oversights in the syntax, etc. 

The arithmetic expression which was used as a model 
earlier will also be used here, namely A+B+O+D 
*E*F+G+H. The details of the algorithm follow: 

(1) The first step is to rewrite the expression in 
reverse Polish form and to reverse its order. 

+H+G + *F*E D+C+B+A 

(2) Starting with the rightmost symbol of the string, 
assign a weight to each member of the string based on 
the following procedure : 
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Assign to symbol Si the value Vi = (Vi_i) + R t - 
i - 1,2,. . .,n 

where Ri = 1 — 5 (Si) given that 

5 (Si) = Gif Si is a variable 

8(8 i) — 1 if Si is a unary operator 

5(Si) = 2 if Si is a binary operator 

and V i— i = V i- 2 +R*— i, V 2 =- V i-z + Ri- 2 , 
etc., 

such that Vi- ( i-i) = Vi = Ri, and V 0 = 0 
Using this procedure, the following expression results: 


Root 

Node 


i 

15 

14 

13 

12 

l 

11 


10 

S s 

+ 

H 

+ 

G 

1 

+ 


* 

Vi 

1 


2 

1 

2 

1 

1 


2 




V„ 







9 

8 


7 

6 

5 

4 

3 

2 

1 

F 

* 


E 

D 

+ 

c 

+ 

B 

A 

3 

2 


3 

2 

1 

2 

1 

2 

1 


Note that for a “well-formed expression” of n svmbols 

Vn = 1. 

(3) At this point the root node of the proposed 
binary tree can be determined. Thus the given string 
can be divided into two independent sub-strings. To 
determine the root node, draw a line to the left of the 
first symbol with a weight of 1 (i = 11, S »= +, V t -= 1) 
to the left of the symbol with the highest weight, 
Vm(i=7, S* = E, V; = V m = 3). The two independent 
substrings consist of the strings to the left and to the 
right of this line. The root node will be the leftmost 
member of the string to the left of the line (i = 15, 
S 4 =+, V<=1). Note that V,- also equals 3 for i = 9; 
however V m is chosen from the earliest occurrence of 
a symbol with the highest weight. 

(4) The next step is to look for parallelism withni 
each of the new substrings. Consider the rightmost 
substring. Form a new substring consisting of the 
symbols within the values of V » = 1 to the right and to 
the left of V m . Transpose this substring with the sub- 
string to the right of it whose leftmost member has a 
weight of Vi=l. 


INITIAL RIGHTMOST SH- *F*ED-f-C+B A 
SUBSTRING V i 1 2 3 2 3 2 1 2 1 2 1 


FINAL RIGHTMOST i 11 10 98 7654321 
SUBSTRING S< + +C + BA*F*ED 

V< 123 13212121 

This procedure is repeated until the initial V m occupies 
the position i=2 in the substring. For this example 
this is already the case. Thus the rightmost substring 
is in the proper form. 

(5) The transposition procedure of step 4 is applied 
next to the leftmost substring. However, since the 
leftmost substring of this example consists of only two 
operands and one operator, no further operations are 
necessary. 

(6) The resultant binary tree is shown in Figure 14. 
The numbers assigned to each node represent the final 
weight Vi of the symbol as determined in steps 1-5 
above. 

Some observations and comments on this algorithm 
are given below. 

(1) The two branches on either side of the root node 
can be executed in parallel. Within each main branch, 
the transposition procedure of step 4 yields supplemen- 
tary root nodes. The sub-branches on each side of the 
supplementary nodes can be executed in parallel. 

(2) The number of levels in the binary tree can be 

LEVEL 



Figure 14 — Binary tree for parallel computation of 
A + B + C + D*F*F+G+H 
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predicted from the Polish form of the original string. 

No. of LEVELS = MAX [NUMBER OF l's; Vm] 
in the substring (rightmost or leftmost) containing Vm. 

(3) The tree is traversed in a modified postorder 
form. 20 The resulting expression is 

D*E*F+-A+B+C+G+H 

(4) An added feature of this technique is that the 
number of registers required to evaluate this expression 
without intermediate STORE and FETCH operations 
is obtained directly from the binary tree. This infor- 
mation is provided by the highest weight assigned to 
any node within the tree. Thus for this example the 
expression could be evaluated using at most two 
registers without resorting to intermediate stores and 
fetches. 

(5) This technique of recognizing parallelism on a 
local level has been applied to a single instruction, in 
particular, an arithmetic expression. It is worthwhile 
mentioning that each variable within the expression 
can itself be the result of a processable task. Thus this 
technique can be extended to a higher level of parallel 
stream recognition, i.e., level parallelism. 

In order to implement the techniques mentioned 
here for components within tasks and the techniques 
mentioned earlier for individual tasks, several system 
features are desirable. Schemes for detecting parallel 
processable components within compound tasks are 
oriented primarily toward arithmetic expressions. For 
these situations string manipulation ability would be 
highly desirable. Since individual tasks are repre- 
sented by a graph and its matrix, the ability to ma- 
nipulate rows and columns easily would be very im- 
portant. In this same area, an associative memory 
could greatly reduce execution time in the implemen- 
tation of precedence partitions. 
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Performance modeling and empirical 
measurements in a system designed for 
batch and time-sharing users 

by JACK E. SHEMER and DOUGLAS W. HEYING 

Scientific Data Systems f a Xerox Company 
El Segundo, California 


INTRODUCTION 

If any design goal is common to all computer system 
organization schemes, it is that of providing “effective 
service” both externally to the user of the computational 
facility and internally with respect to utilization of 
system resources. Thus, generally speaking, there are at 
least two dimensions to this design objective. On the one 
hand, effective service is the external satisfaction of a 
broad spectrum of user demands. For example, the ideal 
system might be visualized as one which economically 
provides a large number of programming languages; 
machine compatibility with other computers of widely 
diverse hardware; and rapid computation. On the other 
hand, effective service is the internal utilization of all 
system components so as to increase computational 
efficiency. In this respect, system structures are im- 
plemented which strive to maximize sub-system 
simultaneity and system throughput. For example, a 
degree of macro-parallelism is attained in many present 
day systems by allowing a central processing unit (CPU) 
and input/output controller to share the use of a main 
memory register, thereby enabling processing and 
input/output (I/O) to proceed concurrently (for one or 
several independent programs, depending upon the 
system software) . 

In general, external effectiveness is all that the user 
sees, and it is therefore of primary interest to him. 
Whereas, the purveyor of the equipment is vitally 
concerned with internal utility and coordination. 
However, this latter consideration indirectly relates to 


the quality of service the user receives (his waiting time 
for service completion, the price he is charged for 
service, etc.). 

The ramifications of hardware and software designs to 
achieve such service can be investigated both internally 
and externally; yet, a particular design strategy need 
not supplement effective service from both viewpoints. 
On the contrary, schemes tailored to improve external 
utilization often degrade internal service effectiveness 
and vice versa. Unfortunately, in confronting these 
design trade-offs, the designer often had to rely upon 
heuristic and intuitive arguments, since there is a 
general lack of design models which quantitatively 
relate system variables to reflect a priori performance 
estimates. Hence, the design is complicated not only by 
trade-offs between the often dissimilar aims of external 
and internal effective service, but also by a deficiency of 
design tools for investigating various implementation 
alternatives. 

These problems are especially amplified with the 
advent of time-shared computer systems. In time- 
sharing systems, an ideal goal is to respond to interactive 
on-line users such that each user receives the impression 
that he has his own computer, yet at a price he can 
afford. Thus in these systems, the computer complex is 
shared among a number of independent users who are 
concurrently communicating with the system, generat- 
ing programs and interactive service requests via 
on-line remote terminal equipment. This action enables 
one to achieve economies of scale and distribute the cost 
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of the system among all users according to their usage 
of the facilities. Similarly, the objective of rapid response 
is realized by time slicing CPU service and sharing it 
among the on-line users. A request for program execution 
is not necessarily serviced to completion; but rather jobs 
are granted finite intervals (quanta) of processing time. 
If a job fails to exhaust its demands during a quantum 
allocation, then it is truncated and postponed according 
to a scheduling discipline, thereby facilitating rapid 
response to short requests. 1-4 This preferential treatment 
of short jobs increases the programmer’s productiveness, 
since one-attempt efforts, editing, debugging, and other 
typically short interactive demands often encounter 
exorbitant turn-around times in batch processing 
environments (i.e., in relation to the amount of actual 
processing time consumed, due to problems of key 
punching, printer output, card stacking, and total 
system demand). 

However, since computation is not necessarily run to 
completion and main memory size is limited (by both 
economic and physical reasons), programs must be 
swapped into and out of main memory as the CPU 
commutates its service from request to request. 
Therefore, unless swapping is achieved with no loss in 
time, it is obvious that service in the time-sharing sense 
is less efficient in CPU utilization than service to 
completion. Also, the time spent scheduling, allocating 
buffers, and controlling swap input/output represents 
overhead or wasted processing time which, due to 
incomplete servicing, is greater in time-sharing systems 
than batch processing systems. Furthermore, if the 
system is dedicated to servicing on-line requests, the 
CPU is essentially idle during periods of low on-line 
input traffic. Hence, a design compromise must be 
attained between external response rapidity and internal 
efficiency since system performance, in the general case, 
is a function of both response to selected classes of users 
and utilization of system resources. 

Yet, exploring such problem areas prior to design is 
complicated, because any performance investigation is 
incorrigibly statistical. Performance is not only a 
function of software characteristics such as the input/ 
output, memory, and processing requirements of each 
on-line request together with the occurrence rate of such 
requests, but also dependent upon hardware character- 
istics such as the instruction processing rate and rates 
accessing secondary memory. 

This paper presents one approach to mitigating some 
of these difficulties. A system design is briefly described 
and then analyzed utilizing a mathematical model. The 
system is structured to accommodate both batch and 
time-sharing users with the goal being to achieve a 


balance of system efficiency and responsiveness. A set 
of variables are defined which characterize on-line user 
demands and the servicing capacity of various} units 
within the system. These variables are then quantita- 
tively related in a mathematical model to derive salient 
performance measures. Examples are given which 
graphically display these measures versus various ranges 
of the system variables. These a priori performance 
estimates are then compared with empirical data 
extracted from the system during its actual operation. 
Here the emphasis is given to mathematical modeling 
because this analysis method is more expedient and 
generally less costly than the alternative approach of 
simulation. Moreover, since many of the variables are 
non-independent and rely upon characterization of user 
demands, and since these are difficult to accurately 
describe prior to actual operation, the macroscopic and 
statistical indications provided by a mathematical model 
are perhaps all that one can feasibly obtain. 

Design and performance study 

System design 

The Batch/Time-Sharing Monitor (BTM) is designed 
to afford SDS Sigma 5 and Sigma 7 users with interactive 
and on-line time-sharing without disrupting batch 
operations. For considerations of efficiency, the primary 
objective of the BTM design is to provide limited time- 
sharing service while concentrating on throughput of 
batch jobs — the servicing of time-sharing users is 
allocated to minimize response for interactive users with 
no special service given to the compute bound on-line 
users (because high-efficiency batch service is available). 

Thus, the system is structured with resources for the 
batch and time-sharing portions of the system separated 
as much as possible. Different areas of main memory are 
allocated so that a (compute bound) batch user is 
always “ready to run.” The file device is common 
because files may be shared between batch and time- 
sharing users. However, the management technique 
used minimizes the interference from this factor. The 
swapping Bapid Access Disc (RAD) for time-sharing 
users is independent of the file device, thus insuring that 
swaps in process do not affect on-going batch programs. 

The batch user is kept essentially compute bound by 
buffering all of his unit record I/O via a RAD. This 
allows the compute portion of each job to follow that 
of the previous job without waiting for the printout, 
etc., to complete. Thus, there is no need to attempt to 
reclaim swap time from one time-sharing user to 
another — a natural claimant: the batch job is readily 
available. 
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Hence, a very simple (and low overhead) swapping 
and scheduling algorithm can be used. As a particular 
user is dismissed, other users are polled in turn to see 
who is “ready to run.” If someone is found (not the 
same user), a replacement swap is initiated and the 
CPU is allocated to the batch job. When the swap-out/ 
swap-in is complete, the new user is given one quantum 
(i.e., providing the batch job has already had at least its 
quantum) ; then the cycle is repeated. 

In this way, batch is guaranteed a certain percentage 
of the machine (and typically gets much more), and a 
moderate number of time-sharing users receive rapid 
response to conversational request. Yet with this 
relatively simple framework, a number of questions are 
unavoidable: How does on-line response and batch 
throughput vary with the number of on-line users, and 
how do other variables such as quantum size and swap 
time relate to system performance? Moreover, how 
does one characterize system performance and the 
variables which influence it? 

Parameterizations and performance measures 

The subject of “on-line” response is unfortunately 
plagued by many interpretations of what constitutes 
response (and, moreover, what defines adequate 
response). For the purposes of this paper, “typical 
on-line requests” are those which require minimal 
central processor time — less than one quantum alloca- 
tion. Thus, the response time Ci to a “typical on-line 
demand” is that period elapsing between request 
generation (the keying in of a control character such as 
“carriage return”) and the termination of the first time 
quantum* which is allocated to the servicing of the 
request. This definition provides the basis upon which 
the on-line performance of the BTM system is analyzed 
in this paper, since it is assumed that on-line users are 
typically in phases of program preparation.** Thus, 
providing the quantum is large enough, the great 
majority of user interactions (e.g., “open the next 
line,” “delete source image,” “perform syntax check 
and insert into text,” etc.) can be satisfied with single 
quantum allocations. 

The mathematical model developed in the Appendix 
enables one to characterize the system by selecting 
values for the variables: 

N = total number of active on-line communication 


* Also note that if the scheduling algorithm is round-robin then 
Ci provides a basis for approximating the response time for 
a request which requires multiple quanta . 4 
** Note that this is not the case in system environments in which 
the on-line users run production (compute bound) programs. 


sources (i.e., the number of remote users who 
are concurrently using the system) . 

X = average user interaction rate (frequency at 
which a single user requests service by the 
CPU). 

n = mean rate at which on-line requests are 
serviced by the CPU (1 / \x = average 
amount of CPU time required to complete 
each request given that the CPU was 
dedicated to the servicing of the request) . 

S = the average amount of time required to swap 
an old user out of core and load a new user 
(clearly, S is dependent upon the swapping 
device as well as program size) . 

q« = time quantum allocated to on-line requests 
(time-sharing users) . 

q B = time quantum given to batch requests 
(background users) . 

m = the average cumulative quantum extension 
(for monitor services such as scheduling, file 
I/O, service calls, etc.) incurred during the 
period elapsing between successive quantum 
allocations to on-line jobs. 

To supplement analysis efforts, the BTM system 
software is capable of monitoring these (and other) 
variables and accumulating their statistical distributions 
during actual system operation. This does not impose 
any significant overhead since much of this data is 
already accumulated in the accounting log, and (as in 
many other commercial systems) used as a basis for 
charging users. 

Upon establishing reasonable values for the above 
variables, the model can then be used to derive per- 
formance measures. In terms of response, the salient 
performance index is E[CJ where 

E[Ci] — the expected response time which “typical 
on-line demands” experience (see defini 
tion given above) . 

In addition, the model can readily be used to estimate 
the percentage of CPU time available for batch jobs; the 
percentage of CPU time received by time-sharing users; 
utilization of the swapping RAD; expectations of 
system revenues; and a variety of other indices obtained 
from combinations of the derived parameters. 

A priori estimates for some of these performance 
measures are given in Figures 1-5 for reasonable ranges 
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NUMBER OF CONCURRE NT 
USERS 

Figure 1 — E[Ci] vs. N (/* = 2.5 requests /see.) 



NUMBER OF CONCURRENT 
USERS 

Figure 2— E[CJ vs. N (m = 5 requests/sec.) 

of the variables N, X, p, S, q«, q^, and m. Obviously, 
these variables will differ from one environment to 
another. Therefore, before discussing conclusions which 
can be drawn from these graphical results, it is appro- 
priate to clarify the parameterizations and assumptions 
which were used in the calculations: 
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Figure 3 — Relative batch capability 



Figure 4 — -N max vs. CPU speed p 


1. The average swap time S was conservatively 
calculated assuming that four RAD accesses are 
required per swap with an average total of 16K 
words transferred during each swap. (The RAD’s 
are head per track rotating memories operating 
at 1800 rpm; and the SDS model 7204, 7232 and 
7212 RAD transfer data at rates 187 X 10 s 
bytes/sec., 384 X 10 s bytes/sec. and 3 X 10“ 
bytes/sec., respectively.) 

2. The user interaction rate X was estimated from 
statistics gathered at RAND 5 and other data 
extracted from the GE/Dartmouth BASIC 
system 6 and the SDS 940 system. 
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N = 18 

q R = 85 ms. (i.«. "swop limited") 



ON-LINE USERS 

Figure 5 — E[CJ vs. q R (N = 18) 


3. The selection of q R = 200 ms. was established 
such that the majority of user interactions are 
satisfied with single quantum allocations. Where- 
as, selecting q 5 = 85 ms. and 200 ms. was done 
merely to demonstrate “swap limited” and 
“batch limited” operation, respectively. 

4. The value of the average monitor time m per 
on-line/batch quantum cycle was approximated 
utilizing batch accounting information and 
timing studies of monitor services. 

5. Values of /x were chosen such that the average 
on-line quantum q B would be ~ 125 ms. to 
150 ms. when 200 ms. was allocated. This 
selection was inferred from data extracted from 
the SDS 940 System and BTM code traces. (Yet, 
note that a single parameter /jl does not provide a 
characterization covering the more general case 
in which tlie processing time distribution is 
multi-modal.f However, for purposes of studying 
interactive response, it provides a good approx- 
imation and lends itself to the mathematical 
analysis.) 


f The multi-modal case arises because of a multiplicity of lan- 
guage facilities and the natural division of requests into interactive 
or compute demands. 


Mathematical results 

Given this framework, let us now turn our attention 
to the figures. Employing the mathematical model, 
a priori estimates of average interactive response time 
E[CJ are displayed versus N in Figure 1 and Figure 2 
for p = 2.5 requests/sec. and \x = 5 requests/ sec., 
respectively. Here, three different curves are plotted in 
each figure to demonstrate the limiting effects of each 
swapping device (i.e., “swap limited” operation when 
the batch quantum q B is less* than the swap time S). 
Also, note that an additional curve is given for the 
model 7212 RAD to display the effects of selecting a 
batch quantum which exceeds the swap time (i.e., 
“batch limited” operation). This latter curve shows that 
the fastest swapping device effectively becomes a slower 
device when q B is set such that operation is “batch 
limited” — the model 7212 RAD is almost equivalent to 
a model 7232 RAD when q B = 200 ms. 

Now since N is the total number of concurrent users 
(active communication sources), Figures 1 and 2 enable 
one to estimate a value for the maximum number of 
users Nmax which the system can simultaneously 
accommodate by: (1) assuming “swap limited” operation 
and (2) defining what constitutes adequate response to 
typical on-line demands. For example, if one assumes 
that adequate interactive response is achieved if « 80% 
of the time a user experiences a delay of less than 5 sec. 
then, depending upon j*, one concludes:** 

i. the model 7204 RAD will accommodate a 

maximum of 10 to 16 concurrent users for*** 
n = 2.5 requests/sec. to \i = 5 requests/sec., 
respectively; 

ii. the model 7232 RAD will accommodate a 

maximum of 16 to 26 concurrent users for 

fi = 2.5 requests/sec. to m = 5 requests/sec., 
respectively; 

iii. the model 7212 RAD will accommodate a 

maximum of 26 to 38 users for m = 2.5 requests/ 
sec. to ix — 5 requests/sec., respectively. 

However, the actual number of on-line users who 


* For this situation, the actual batch quantum allocation is the 
swap time S. 

** These conclusions were made by assuming that the probabil- 
ity distribution for response time Ci is such that twice the mean 
E[Ci] is (at least) the 80 percent point. This is a reasonable assump- 
tion in light of both the mathematical characterizations used in 
the model and empirical measuresments. 

***Note that reducing fi from 5 requests /sec. to 2.5 requests/sec. 
is tantamount to reducing processing speed by a factor of 1/2. 
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concurrently use the system is a statistical parameter 
which generally is less than N max and varies according 
to the total number of on-line subscribers, their 
demands, processing speed, N max , etc. In practice, the 
total number of on-line subscribers typically exceeds 
N m ax by at least a factor of three. 

For the above cases, nominally 50-80% of the CPU 
time is available for batch jobs. This is shown in 
Figure 3. Similarly, utilizing this same response 
criterion, it is interesting to observe the effects of 
increasing**** CPU speed /x. This is demonstrated in 
Figure 4 for each of the swapping devices. As CPU speed 
increases indefinitely, the capacity of the system to 
service on-line requests approaches a limit established 
by the swapping device. 

Additional insight into system responsiveness is 
provided by Figure 5. Here, E[CJ is graphically 
displayed versus the on-line user quantum q R for “swap 
limited” operation and N = 18 (with all other variables 
the same as those employed in Figures 1 and 2.) Note 
that the selection of a minimum q R is very critical; 
however, having established a minimum q^, the varia- 
tions are not dramatic for a relatively large range above 
minimum q*. Also, notice that as ju is reduced from 5 
requests/sec. to 2.5 requests/sec., a model 7232 RAD 
must be used to achieve what a model 7204 RAD 
accomplished in the former case; and similarly, a model 
7212 RAD is required to equal the performance of a 
model 7232 RAD. 

Experimental results 

Extensive statistics were gathered from the system 
(while running typical jobs) with a twofold purpose in 
mind. First, it was necessary to substantiate the validity 
of the assumptions employed in the model; i.e., establish 
that the chosen parameters were indeed consistent with 
the actual environment. Secondly, a correlation between 
empirically measured performance and the results of the 
model would lend credence to the validity of the model, 
and therefore allow us to extrapolate and predict 
performance for other user environments and system 
configurations. 

The first objective was accomplished by observing a 
RTM system which used a model 7212 RAD for 
swapping with quanta q^ = q B =* 200 ms. Values for 
X, fi , m and program size were tabulated for many 
different observation periods. For each of these monitor- 
ing sessions different average values were obtained, but 


**** Note that this latitude is only possible on a limited basis 
(e.g., code optimization, faster memory, faster operation unit, 
multi-processing, etc.) 


the values /z = 3.5 requests/sec., X = 1 request/15 
user-sec., S = 85 msec, and m = 100 msec, were found 
to be quite representative of most samples. The variables 
fx and X were most subject to variation and ranged from 
2 to 6 requests/sec. and from 1 request/25 user -sec. to 
1 request/10 user -sec., respectively. Also, the data 
indicated that the assumptions of exponentially dis- 
tributed CPU time and request inter-arrival time 
provided good approximations of user demands. 

Given that the first objective was satisfied, realization 
of the second objective is buttressed by Figure 6 which 
plots the average of all sampled values for two of the key 
performance indications (average response time E[CJ 
and CPU time available for batch Pr[B]) as a function 
of the number of users N. Upon comparing these results 
with the mathematical predictions (also see Figures 1-3), 
one can infer that (at least for the range of variables 
considered) the mathematical model is reasonably 
consistent with actual system operation. 

Comments 

The analysis presented above primarily focused atten- 
tion on the system's capacity to accommodate user 
demands. Even though no mention was given to 
cost/performance tradeoffs, the model lends itself to 
this latter design consideration. For example, the 
variables N, Pr[B], and /x might be combined to reflect 
the revenue derived for service to batch jobs and the 
revenue obtained for servicing interactive users which 
could then be weighted against the cost expended to 
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Figure 6 — Empirical results 
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provide (and maintain) the system complement. This 
would provide a basis for the designer to balance CPU 
cost/performance with that of other system elements. 

The process of selecting and examining performance 
indexes similar to those discussed here enables the 
designer to better appraise the many implementation 
tradeoffs which confront him. Moreover, when supple- 
mented with empirical data, these techniques provide a 
basis for not only configuring existing systems but also 
synthesizing new systems. However, it should be 
emphasized that apart from the mathematical model 
itself and its macroscopic treatment of the system, the 
fidelity of the results and conclusions obtained in this 
analysis (or any analysis of this sort) can only be as good 
as the accuracy attributed to the independent variables 
(N, X, m, m > S). The values possessed by these variables 
dramatically affect performance and will vary from one 
environment to another. Therefore, one should be 
cautious before inferring any explicit and universal 
characterizations of system performance. 


completion of a request and generation of a new request 
on a given line is described by the distribution function 

. , , - e- Xx for x > 0 

A(x) -|o for x < 0 

Similarly, assume that the service time t required by 
each on-line request is exponentially distributed with 
parameter \x and characterized by the distribution 
function 



for t > 0 
for t < 0 


Given that there are N channels, let p denote the 
probability that n on-line requests are queue 1 at i o ne 
arbitrary time t for n — 0, 1, • * -N, then 

j -NApo(t) + pPr[R(t)]pi(t) 

for n = 0 
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[(N - n)X + pPr[R(t)]]p*(t) 
dpn(t) + (N - n + l)Xp„-i(t) 

dt j + pPr[R(t)]p„+i(t) 

for 0 < n < N 

pPr[R(t)]p w (t) + Xpjv_i(t) 

for n = N 


where Pr[R(t)] denotes the probability that at time t 
the computer is servicing one of the remotely generated 
on-line requests. Note that in the above equations, the 
input rate is (N — n)X when n requests are queued. 
Thus the model accounts for the natural variations in 
demand intensity which r 'suit because there are a finite 
number N of input sources. 

From these equations, the stationary probability 7 
that n on-line requests are queued is 

N! ( X y 
pn ~ (N — n) ! \p Pr[R]/ p0 


where 


Pr[R] — limit Pr[R(t)] and 
t — » oo 


APPENDIX 

BTM mathematical model 

Consider the generation of on-line requests on each 
communication channel is an exponential process with 
parameter X. Hence, the time interval x between 


po 


1 


N 

1 + Z 



The probability Pr[R] can be estimated by considering 
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the interval which elapses between successive allocations 
of a quantum to on-line users. Let T fc denote the total 
time between the 0 th on-line quantum completion and 
the k th on-line quantum completion. If the k th comple- 
tion leaves the on-line queue in an empty state, then the 
expected value of the time AT*. until the next on-line 
quantum completion is 

1 - - _ 

E[AT fc j WM o] = + q* + q R + m 

where q B is the average quantum which batch users 
receive; q^ is the expected duration of an on-line 
(remote user) quantum; (1/NX) is the mean time until 
the generation of the next on-line request; and m is the 
expected monitor overhead time per batch/on-line 
quantum cycle. Here, m accounts for any scheduling; 
I/O overhead; file operations, and any other CPU time 
pre-empted by the monitor which results during the 
cycle of a quantum allocation to a batch job followed by 
a quantum allocation to an on-line job. 

In the case when the k th on-line quantum completion 
does not leave the interactive user queue empty, then 
with probability (1 — p 0 ) 

E[ATjb|»>J = qn + q R + m 

Now let T b , T b, and T m denote respectively the length 
of time out of T* which the system spends servicing 
batch jobs, on-line jobs, and monitor functions, 
respectively. 

Then as k goes to infinity, the ratios T fl /k, T*/k, and 
T m /k converge with probability one to (q s + p 0 /NX), 
q R , and m, respectively. Therefore, in the limit, an 
approximation to the fraction of the time which the 
system spends servicing on-line requests is 

Pr[R] = lim [T r 1 _ lim [T R /k\ 

*-"• I_tJ - *-*• 1%/kj 

_qfl 

= q R + qg + m + Po ( ^ 

\NX 

Then, noting that q B + q R is essentially the computa- 
tion cycle, this leads one to express Pr[R] as 



is to let f increase by some small Af until a solution for 
Po is obtained which is consistent with Pr[R]. The 
variable f satisfying this criterion will vary dramatically 
depending upon N, m, p, X and q s . 

Upon solving for p 0 , the percentage of CPU time 
available for batch jobs is 


Pr[B] = 


q b + po (i/nx) 

q R + qs + m + po (1/NX) 


The variables! q s and q R are heavily influenced by 
quantum periods and swap time. If one assumes that 
(with the exception of a batch quantum allocation every 
other quantum) on-line jobs run on a demand basis 
(i.e., the batch quantum q s is less than the swap time S), 
then q s = S. Hence, the swap time limits the rate at 
which successive quantum allocations are provided to 
the on-line requests (i.e,, maximum service capacity is 
given to on-line requests). Whereas, if the batch 
quantum limits the servicing of on-line requests 
(q B > S), then q B = q*. Therefore, for completeness 


q B - 


q B if S < q B 


l S if S > q B 

and from the exponential distribution of service time 
for on-line requests 1-4 


^ = 1/p (1 - er*«a) 


Given the foregoing relations, it is now possible to 
estimate the expected cycle time E[CJ which an 
individual on-line (remote user) request experiences 
before it is granted its first quantum allocation. As 
emphasized in an earlier section, attention is focused 
upon E[Ci] since it provides an indication of the respon- 
siveness of the system to handling “typical on-line 
requests” which require less than one quantum of CPU 
time. By considering the system only at epochs of 
transition between batch to on-line, monitor to batch 
(or on-line), idle to batch (or on-line), etc., one obtains 
the approximation 

E[CJ ~ po E[T 0 ] + E[n] (qs + q^ + m.) + q« 


Q/e / m 

Pr[R] — - - -- where f > ( 1 + ~ 




f(q* + q R ) V ' qB + q R ) 

Here, f is an appropriate scale factor introduced to 
facilitate solving for 


{p n }’ The numerical technique 

l )nm o 


where 

N 

E[n] = n P» 

n«-l 

and E[T 0 ] is the expected time remaining subsequent to 
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the arrival of an on-line request before the next quantum 
allocation is initiated. The value of E[T 0 ] is difficult to 
accurately express since it is a function of the probability 
densities for q fl and m together with machine state 
probabilities; however, it is clear that 

E[T 0 ] < [q B + m] 


time interval t given that m requests are queued. For 
example, with exponential inter-arrival 

Pr[k| m, t] = “ m ) (1 - e- x ')\e- Xi ) Ar - m -* 

Also, in the above equations 


At any rate, E[T 0 ] is not a dominant factor in E[Cj 
unless E[CJ is extremely small (i.e., EfCi] ~ q* + E[T 0 ], 
for example). Hence, the precise value of E[T 0 ] is not 
critical in those cases which are of particular interest 
(namely, those resulting when the on-line queue tends 
toward saturation; i.e., E[n] — > N). 

In addition to the above result for E[CJ, since the 
scheduling discipline is round-robin, it is possible to 
estimate 2 ^ 4 the expected total response time E[r| t] for 
an on-line request which requires a processing time t in 
excess of a single quantum q R 

E[R| t] t + <t/q*> [E[Ci] - (p 0 E[T„] + q„) 

+ qn + m] 

where <a/b> is the smallest integer greater than a/b. 
Alternate model 


y = 


Smax if service to on-line customers is swap 
limited (i.e., q B < S) 

q B if batch quantum limits on-line service 
(i.e., q B > S) 


Here, p* denotes the probability density function which 
describes the batch quantum allocation, and p B + R is 
the convolution of p B with the density function p^ 
defining the distribution of an on-line quantum alloca- 
tion. Both p B and p R include overhead functions to 
account for file I/O, monitor overhead, etc. 

The density function p s is derived from the swap time 
distribution when q B < S; whereas, it depicts the CPU 
servicing of batch requests when S < q B . For example, 
in the latter case with S(z) representing the Dirac-delta 
function describing an independent variable z, one 
could characterize the constant batch allocation interval 
by 


Let Pmn(Tk) denote the probability that n on-line 
requests are queued at epoch T& marking the completion 
of the k th on-line quantum allocation, given that at 
epoch T*_i there were m on-line requests awaiting 
service from the system. 1 ’ 2 Then independent of k 
since the CPU servicing of requests is characterized as 
an exponential process 


i 


y 

Pr[n — m| m, q B + t]p„(t) dt 


ptnn 


/ y+QR-e 

Pr[n - m + l|m, t] p B+B (t) dt 
for 1 < m < n 

0 for n < m — 2 ; m > 1 


py+QR-e 

/ Pr[0|m, t] p B+B (t) dt 

Jo 

for n = m — 1 > 0 

where € — > 0 and Pr[k|m,t] denotes the conditional 
probability of generating k new on-line requests in a 


Pu(t) = 5(t - (y B + q B )) 


where the constant y B reflects batch overhead. Similarly, 
letting y R denote the overhead incurred during an 
on-line quantum allocation 


P*(t) = j^e' 


0 


for t < y R or t > y R + q* 

+ e~ m R 8 ( t - (q R + y R )) 

for y R < t < y R + q R 


For completeness, the transitions from the 0-state are 
assumed to be 

Pan = f Pr[n|o, t] p B (t) dt 
Jo 

Then, having formulated the state transitions {p mn j 
and defined the density functions p B (t) and p B + B (t), the 
problem remains to solve for the steady-state proba- 
bilities. This is accomplished by noting that the p m n’s 
define an ergodic Markovian chain whereby in matrix 
form with P = (pmn) there exists a unique set of numbers 
{pm}f=o such that 

(poPl* * * Pn) V = (poPl * ' 'On) 
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and 

AT 

Pn ~ 1 

71=0 

The solution of these equations produces the limiting 
stationary probabilities {p„}n^o which could be used in 
calculating E[n] to provide a more accurate estimate of 
E[Ci]. (That is, providing one can accurately describe 
pfi, Ps+7?> X, etc.). 

However, since the accuracy of such variables would 
be highly questionable in the absence of any empirical 
information and since this latter model presents a 
number of non-trivial mathematical difficulties, it was 
not utilized to derive the results given in this paper. 
Yet, in the future, as sufficient data is accumulated from 


the actual operation of BTM systems, then the latter 
model will enable us to extrapolate and better predict 
the effects of alterations to the system (e.g., improve- 
ments resulting from faster swapping devices or 
increases in CPU speed) . 
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Dynamic protection structures 
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INTRODUCTION 

A very general problem which pervades the entire field 
of operating system design is the construction of pro- 
tection mechanisms. These come in many different 
forms, ranging from hardware which prevents the exe- 
cution of input/output instructions by user programs, 
to password schemes for identifying customers when 
they log onto a time-sharing system. This paper deals 
with one aspect of the subject, which might be called 
the meta-theory of protection systems: how can the 
information which specifies protection and authorizes 
access, itself be protected and manipulated. Thus, for 
example, a memory protection system decides whether a 
program P is allowed to store into location T. We are 
concerned with how P obtains this permission and how 
he passes it on to other programs. 

In order to lend immediacy to the discussion, it 
will be helpful to have some examples. To provide 
some background for the examples, we imagine a 
computation C running on a general multi-access 
system M. The computation responds to inputs from 
a terminal or a card reader. Some of these look like 
commands: to compile file A, load B and print the 
output double-spaced. Others may be program state- 
ments or data. As C goes about its business, it executes 
a large number of different programs and requires at 
various times a large number of different kinds of 
access to the resources of the system and to the various 
objects which exist in it. It is necessary to have some 
way of knowing at each instant what privileges the 
computation has, and of establishing and changing 
these privileges in a flexible way. We will establish a 
fairly general conceptual framework for this situation, 


and consider the details of implementation in a specific 
system. 

Part of this framework is common to most modern 
operating systems; we will summarize it briefly. A 
program running on the system M exists in an environ- 
ment created by M, just as does a program running in 
supervisor state on a machine unequipped with soft- 
ware. In the latter case the environment is simply the 
available memory and the available complement of 
machine instructions and input/output commands; 
since these appear in just the form provided by the 
hardware designers, we call this environment the bare 
machine. By contrast, the, environment created by M 
for a program is called & virtual or user machine . 6 It 
normally has less memory, differently organized, and 
an instruction set in which the input/output at least 
has been greatly changed. Besides the machine reg- 
isters and memory, a user machine provides a set of 
objects which can be manipulated by the program. The 
instructions for manipulating objects are probably 
implemented in software, but this is of no concern to 
the user machine program, which is generally not able 
to tell how a given feature is implemented. 

The basic object which executes programs is called 
a task or process ; 5 it corresponds to one copy of the 
user machine. What we are primarily concerned with 
in this paper is the management of the objects which 
a process has access to: how are they identified, passed 
around, created, destroyed, used and shared. 

Beyond this point, three ideas are fundamental to 
the framework being developed : 

1. Objects are named by capabilities , 3 which are 
names that are protected by the system in the 


27 



28 Fall Joint Computer Conference, 1969 


sense that programs can move them around but 
not change them or create them in an arbitrary 
way. As a consequence, possession of a capa- 
bility can be taken as prima facie proof of the 
right to access the object it names. 

2. A new kind of object called a domain is used to 
group capabilities. At any time a process is 
executing in some domain and hence can exercise 
the capabilities which belong to the domain. 
When control passes from one domain to an- 
other (in a suitably restricted fashion) the capa- 
bilities of the process will change. 

3. Capabilities are usually obtained by presenting 
domains which possess them with suitable 
authorization, in the form of a special kind of 
capability called an access key. Since a domain 
can possess capabilities, including access keys, 
it can carry its own identification. 

A key property of this framework is that it does not 
distinguish any particular part of! the computation. In 
other words, a program running in one domain can 
execute, expand the computation, access files and in 
general exercise its capabilities without regard to who 
created it or how far down in any hierarchy it is. Thus, 
for example, a user program running under a debugging 
system is quite free to create ariother incarnation of 
the debugging system underneath him, which may in 
turn create another user program which is not aware 
in any way of its position in the! scheme of things. In 
particular, it is possible to reset things to a standard 
state in one domain without disrupting higher ones. 

The reason for placing so much weight on this prop- 
erty is two-fold. First of all, it provides a guarantee 
that programs can be glued together to make larger 
program^ without elaborate prejarrangements about 
the nature of the common environment. Large systems 
with active user communities quickly build up sizable 
collections of valuable routines. The large ones in the 
collections, such as compilers, often prove useful as 
sub-routines of other programs. Thus, to implement 
language X it may be convenient to translate it into 
language Y, for which a compiler already exists. The X 
implementor is probably unaware that Y’s implemen- 
tation involves a further call on an assembler. If the 
basic system organization does not allow an arbitrarily 
complex structure to be built up from any point, this 
kind of operation will not be feasible. 

The second reason for concern about extendibility 
is that it allows deficiencies in the design of the system 
to be made up without changes in the basic system 
itself, simply by interposing another layer between the 
basic system and the user. This is especially important 


when we realize that different people may have different 
ideas about the nature of a deficiency. 

We now have outlined the main ideas of the paper. 
The remainder of the discussion is devoted to filling 
them out with examples and explanations. The entire 
scheme has been developed as part of the operating 
system for the Berkeley Computer Corporation Model 
I. Since many details and specific mechanisms are 
dependent on the characteristics of the surrounding 
system and underlying hardware, we digress briefly 
at this point to describe them. 

Environment 

The BCC Model I is an integrated hardware and soft- 
ware system designed to support a large number (up to 
500) of time-sharing users. This system consists of 
two central processors, several small processors, a large 
central (core and integrated circuit) memory, and rotat- 
ing magnetic memory. The latter contains more than 
500X10 6 bytes, including approximately 12X10® bytes 
of drum having a transfer rate of more than 5x10® 
bytes per second. 

The hardware allows each process more than 5121c 
bytes of virtual memory. The central processors can 
accommodate operands of various sizes including 48- 
and 96-bit floating point numbers. The addressing 
structure allows characters, part-word fields and array 
elements to be referenced directly. The subroutine- 
calling instruction passes parameters and allocates 
stack space automatically. System calls are handled 
exactly like ordinary function calls.; when arrays or 
labels are passed to the system they are checked auto- 
matically by the hardware so that they can be used 
by the system without further ado. 

The memory management system organizes memory 
into pages. A page is identified by a 48-bit unique name 
which is guaranteed different for each page ever created 
in the system. Tables are maintained in the central 
memory which allow the page to be found in the various 
levels of the memory system. These tables are auto- 
matically accessed by the address mapping hardware 
the first time the page is referenced after the processor 
starts to run a new process. Thereafter its real core 
address is kept in fast registers. It is therefore unneces- 
sary for any program other than a small part of the 
basic system to be concerned about the location of a 
page in the memory system; when it is referenced, it 
will be brought into the central memory if it is not 
already there. Extensive facilities are provided, how- 
ever, to allow a process to control the level in the memo- 
ry hierarchy of the pages it is interested in. The work 
of managing the memory is done by a processor with 
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read-only program memory and data access to the 
central memory; this processor has a 100 ns cycle 
time, so that it can handle the large amount of com- 
puting required to keep up with demands placed on 
the memory system. Another small processor handles 
the remote terminals, which are multiplexed in groups 
of 20 to 100 at remote concentrators and brought 
into the system oyer high-speed lines. 

Pages are grouped into files , which are treated as 
randomly addressable sequences of pages. The only 
mechanism provided to access the data in a file is to 
put a page of the file into the virtual memory of a 
process. Files and processes are named and have pro- 
tection information associated with them. 

Domains in action 

Before plunging into a detailed analysis of capa- 
bilities and domains, we will look at some of the practi- 
cal situations which these facilities are designed to 
serve. They all have the same general character: several 
programs with different privileges exist. Each program 
corresponds to one domain. Some of the domains con- 
trol others, in the sense that the capabilities of a con- 
trolled domain are a subset of those of its controlling 
domain. As a first example, consider the command 
process CP of an operating system. This program 
accepts a command, perhaps from a remote terminal, 
and attempts to recognize it as a call on a program X 
which CP knows about. If it succeeds, CP calls on X for 
execution, passing it any parameters which were in- 
cluded in the command. To do this, CP must set up 
a suitable environment for X to function in. In par- 
ticular, enough memory must be provided for X to 
run, X must be loaded properly, and suitable input/ 
output must be available. When X is finished, it will 
return and CP can process a new~ command. 

The key point is that we want CP to be protected 
from X, to ensure that the user’s commands continue 
to be processed even if X has bugs. In particular, we 
want to be sure that 

1. X does not destroy CP’s memory or files, so 
that CP can continue to run when X returns. 

2. CP can stop X if it goes wild. Usually we want 
the ability to set a time limit and also to inter- 
vene from the terminal. 

In other words, we want CP and X to run in separate 
domains, as illustrated in Figure 1 (since this is an 
informal discussion, we do not trouble to distinguish 
carefully between the program X and the domain in 
which it runs). Here we have shown the call from CP 
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Figure 1 — A command processor and its command 

to X in two forms: in the picture on the right, and as 
a return capability in X. The reason for the capability 
is that X cannot return with a simple branch oper- 
ation, since it would then be able to start CP running 
at any point, which would destroy the protection. 

Suppose now that we want to allow X to get addition- 
al commands executed. X might, for example, be a 
Fortran compiler whose output must be passed 
through an assembler. A simple way to do this is to 
put the assembler input on a file called, say, FOR- 
TRANTEMP, and issue the command, 

ASSEMBLE FORTRANTEMP, BINARY 

This command is just a string, which can easily be 
constructed by the compiler X. To get it executed, 
however, X must be able to call CP. This situation 
is illustrated in Figure 2 ; note the call capability in X, 
which is quite different from the return capability. 
We are ignoring for the moment the question of how 
CP knows that X is authorized to call the assembler. 

If the idea of the preceding paragraph is pursued, it 
suggests the value of being able to switch the source 
of command input and the destination of command 
output in a flexible way. By these terms we mean the 



Figure 2 — A recursive command processor 
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traffic between a program and the entity by which it 
is directed. In a time-sharing system this is normally 
a terminal at which the user is sitting; in a non-inter- 
active system it will be a file of control cards. It is 
often desirable, however, to switch between the two, 
so that routine processing can be done automatically 
when the user's attention is elsewhere, yet he can 
regain control when things go awry. Again, it is not 
uncommon to wish to capture a complete record of a 
conversation between user and machine for later 
analysis and replay. More radical, it may be of interest 
to replace the user at his terminal with a program 
which can manipulate the strings of characters which 
constitute commands and responses. In this way major 
changes in the external appearance of a system can 
be obtained with little effort. 

All of these things can be accomplished by giving 
interactions with the command I/O device the form of 
calls to a different domain which acts as a switch. A 
generalization to include the possibility of different 
command devices for different domains is easy. Thus, 
a user may initiate a program in a domain X which, 
while continuing to communicate with him, starts a 
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Figure 3a — Switchable control I/O — the' domains 



Top-level command processor initiates a 
command 

which wants to drive another command 
processor with some pre-stored or computed 
input. It therefore creates another CP 
and calls it, telling CIO to use MC for 
its I/O 



The lower CP is given a command to call 
the user program X. 

This program needs input 

which it gets by calling CIO, the domain 
which is switching the control I/O. CIO calls 

the current input source, which is MC 


Figure 3b — Switchable control I/O —the calls 


subsidiary domain and feeds it commands. The sub- 
sidiary, unaware of the way in which it is being; driven, 
may iterate the process by creating Z. The key fact 
which makes it all work is the isolation of one domain 
from others. Thus, Y may decide to close all its files 
without disturbing X, since Y has no way of even 
knowing about X's files, much less accessing them. Z, 
on the other hand, can be an open book to Y. Various 
aspects of the situation are illustrated in Figure 3. 

This section concludes by analyzing a problem of 
great practical importance: how to construct a debug- 
ging system. This example is a good source of insights 
into the facilities required of a protection system be- 
cause of the great variety of things which can be ex- 
pected to go wrong during debugging. There are two 
domains, one for the debugger D and one for the pro- 
gram X being debugged. We of course want D to be 
protected from X. Equally important, we want X to 
be completely open to D, so that every object accessible 
to X is also accessible to D, and furthermore that D 
can find all the objects accessible to X as well as access 
them. Otherwise D will not be able to find out what X 
has done or to undo any damage. Furthermore, we 
want D to be able to imitate any actions which X 
can take, so that D can create suitable initial conditions 
for debugging parts of X. Thus, D needs operations 
which, given a capability for X, allow D to 

find all the capabilities in X 
copy capabilities between D and X 
destroy capabilities in X 

enter X at any point with any machine state 
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With these powers, D can also handle domains which 
X has created, since it can get hold of X’s capabilities 
for them. Breakpoints can be inserted in X in the 
form of calls onD. 

Domains and capabilities 

The nature of capabilities 

As we have already said, a capability is a protected 
name of an object. When any object is created, a 
capability is created to name it; without the capability 
the object might as well not exist, since there is no 
way to talk about it. The capability may be thought 
of as an ordinary data item enclosed in a box which 
prevents tampering with the contents. Thus, for ex- 
ample, it may be convenient to make a capability for 
a file consist of simply the disc address of its index. 
This is entirely satisfactory, since programs which 
handle the capability cannot modify it. If they could, 
disaster would ensue, since any program could put 
any desired disc address into a file capability, and 
there would be no protection at all. If the machine 
hardware allows a word to be tagged so that it cannot 
be modified except by the supervisor, then we have 
precisely what we want for a capability. The situation 
is illustrated in Figure 4. It should be possible to load 
and store such a word (including the tag bits) in order 
to give programs the necessary freedom to manipulate 
the names of the objects they are working with. 

If this kind of hardware is not available a different 
and potentially confusing implementation is required. 
The potential can be kept from realization by referring 
back to the “pure” implementation of the last para- 
graph. What is required is to hide the capabilities 
away in the supervisor and provide programs with 
unprotected names which can be used to refer to them. 
When a program running in domain D presents one 
of these names, it is necessary to check that it actually 
names a capability which belongs to D. This can easily 


Capability: 


TAG 
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TAG = read-only, except to supervisor 
TYPE = FILE 

VALUE =* disk address of index 
Figure 4 — Structure of a capability 
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Figure 5 — Capabilities and unprotected names 


be done, if there are n such capabilities, by using 
numbers between 1 and n for the names. 3 An attractive 
alternative, if domains can be grouped into larger units 
which share many capabilities, is to number the 
domains from 1 to i and the entire collection of capa- 
bilities from 1 ton and to attach a string of i bits to 
each capability. Bit d is on exactly when the capability 
belongs to domain d . Figure 5 illustrates. 

A somewhat more expensive implementation is to 
search a table associated with the domain whenever 
an unprotected name is used. This scheme shares with 
the bit-string idea the advantage that it is easy for 
different domains to use the same names for the same 
object. 

There are capabilities for all the different kinds of 
objects in the system. On the Model I these are 

files 

pages of memory 
processes 
domains 
interrupt calls 
terminals 
access keys 

Domains and memory 

The nature of a domain is considerably more de- 
pendent on the underlying system than is the case 
for capabilities, mainly because of the treatment of 
memory. From a purist’s viewpoint, every access to a 






32 


Fall Joint Computer Conference, 1969 

memory word is an exercise of a capability for that 
word. A more moderate positioh, and one which is 
quite feasible on suitable hardware, is to view each 
access as the exercise of a capability for a segment 
which contains the word. 2 The mapping hardware 
which implements segmentation is thus viewed as part 
of the capability system, and d satisfying unity of 
outlook is gained. Since a segment is identified by 
number, the preceding section applies. We shall not 
consider the formidable difficulties which arise if differ- 
ent domains use different names fpr the same segment. 

If segments are accessed through capabilities like 
everything else, then a domain consists of nothing more 
than a collection of capabilities. On machines not 
equipped with the proper hardware a domain has an 
address space as well. In the Model I this is a list of 
the pages which occupy each of the 64 slots for pages 
in the 128k memory which is accessible to a user pro- 
gram. 

It is also necessary to deal w|th the fact that the 
hardware does not allow one dtamain to access the 
address space of another one directly. This fact is of 
great importance when we consider how data is passed 
back and forth between domains; since it implies that 
arrays cannot be passed simply by specifying their 
addresses. It is therefore extremely convenient to in- 
clude as part of a call the ability to pass scalar data 
items, and essential to include th^ ability to pass capa- 
bilities. From this foundation arbitrarily complex com- 
munication can be built, since capabilities for pages, 
files and domains can be passed. Thus, if an array needs 
to be passed as a parameter, it is sufficient to pass 
capabilities for the pages or file containing the array, 
together with its base address apd length. The called 
domain can then put the pages into its address space 
and access the array. This is of course much less con- 
venient than passing an entire segment as a parameter, 
but it is quite workable. 

An alternative approach is to organize the hardware 
so that the address space of one domain is a subset to 
that of another. This eliminates all problems when the 
smaller one calls the larger, although it does not help 
at all when we want to share only part of the address 
space. A subset organization fits well with a linear or 
“ring”-like system 4 in which the domains are numbered, 
and the capabilities of domain i are a subset of those 
of domain i-1. As we shall see, there are good reasons 
for wanting a more flexible scheme, but for a great 
many applications a linear ordering is quite satisfactory. 
To allow these to be handled more efficiently, the 
Model I hardware breaks the address space of a process 
into three rings: 


monitor 

utility 

user 

in decreasing order of strength. The hardware enforces 
a restriction that addressing cannot go into a higher 
ring. It also provides protected entry points into the 
utility and monitor rings and automatically checks 
addresses passed into these rings as parameters to 
ensure that they are legal in the ring from which they 
came. 

This simple hardware-implemented structure permits 
three domains to transfer control around among each 
other and to address each other's memory in a very 
convenient and efficient way. The price paid is a ri- 
gidity in structure, and a drastic incompatibility with 
the main, software-implemented domain mechanism. 
The incompatibility is resolved by requiring a change 
in ring to be reported to the software, except when the 
only processing to be performed before returning the 
original ring can be done with the capabilities of the 
original ring. Short calls thus remain cheap, while the 
overhead added to longer ones is not excessive. 

Domains and processes 

The relationship between domains and processes is 
another area greatly influenced by the surrounding 
system. The logical nature of the two kinds of object 
allows a great deal of freedom: in fact, a domain has 
much the same appearance to a process that a segment 
of memory does. The storage for capabilities provided 
by a domain can accommodate many processes, and a 
single process can switch from one domain to another 
(subject to restrictions which are considered in the 
next section). 

In the Model I, however, storage is allocated in 2k 
pages, and one of these, called the context block, is 
used to hold the system-maintained private data for 
each process. The cost of having a process is thus high, 
and there is considerable incentive to minimize the 
number of processes; usually one is enough per compu- 
tation, if advantage is taken of the interrupt facilities 
described later. When the usage of space in the context 
block is analyzed, it turns out that there are only two 
items which would have to be duplicated to allow 
several processes to run with the same address space. 
These are a 14-word machine state and a stack used 
for local storage when the supervisor is executing in 
the process. This stack has a minimum of about 60 
words and can grow to several hundred words at certain 
points during supervisor execution. It is therefore the 
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main barrier to the existence of cheap processes. The 
problem can be greatly alleviated by allocating stack 
space dynamically at each function call and releasing 
it at each return, but this would require some major 
changes in system organization. 

Although processes are expensive, domains are quite 
cheap, since the bit-string method is used to assign 
capabilities to domains. Each process in the Model I 
can have about a dozen domains associated with it. 
The process can run in any of its associated domains 
but in no others. This implies that two processes never 
run in the same domain. 

In a system in which processes are cheap, it is possible 
to take an entirely different approach which encourages 
the creation of processes for every purpose. In such a 
system, parallel processing is of course greatly facili- 
tated. In addition, free creation of processes can be 
used to give a somewhat different form to many of 
the facilities described in this paper. 3 

It is perhaps worthwhile to point out that a machine 
whose addressing is not organized around a stack or 
base registers cannot reasonably run several processes 
out of the same domain unless they are executing total- 
ly disjoint code, because of the problem of address 
conflicts. 

Transfers of control 

Calls 

The only reason for creating a domain is to establish 
an environment in which a process may execute with 
different protection than that provided by any existing 
domain. If this objective is to be fulfilled^ transfers of 
control between domains must be handled with great 
care, since they generally imply the acquisition of 
new capabilities. If it is possible for a process running 
in domain X to suddenly jump into domain Y and 
continue execution at any arbitrary point, X can cer- 
tainly induce Y to damage the objects accessible 
through Y’s capabilities. 

To provide an adequate mechanism for transfers 
between domains, we introduce the idea of a protected 
entry point or gate , and make the rule that transfer 
into a domain is normally allowed only at a gate. A 
gate is a new kind of capability which can be created 
by anyone with a capability for the domain. It specifies 
a location to which control is to go when the gate is 
used. Gates can be passed around freely like other 
capabilities, and each one may be viewed as conferring 
a certain amount of power, namely the power to ac- 
complish whatever the routine entered by the gate is 


designed to do. With gates it is possible to selectively 
distribute the powers of a domain in a flexible way. 

A transfer through a gate usually takes the form of 
a subroutine call; some provision must therefore be 
made for a return. It is not satisfactory to create 
another gate which the called process may return 
through, since he might save it away and use it to 
return at some later and unexpected time. Instead, 
the domain and location to return to are saved on a 
call stack in the supervisor, from which the return 
operation can retrieve them. It is possible to call a 
domain recursively with this mechanism, a feature 
which is generally desirable and also quite important 
for the trap and interrupt system about to be described. 

In order to allow the stack to be reset in case of an 
error, or for any of the other reasons which prompt 
programmers to reset stacks, a jump-return (n) oper- 
ation is provided which returns to the domain n levels 
back. Protection is maintained by requiring the domain 
doing the jump -return to have capabilities for all the 
domains being jumped over. 

Traps 

A trap is caused by the occurrence of some unusual 
event in the execution of the program which requires 
special handling, such as a floating point overflow, a 
memory protection violation or an end of file. When a 
trap occurs, it forces control to go to a specified place , 
where presumably a routine has been put to deal with 
the event. Whether any particular event causes a trap 
or simply sets a flag which can be tested by the program 
is a decision which should be under the programmer’s 
control. Traps may be initiated by hardware (e.g.. 
floating overflow) or may be artifacts of the software; 
as with most distinctions between hardware and soft- 
ware implementation, this one is of little importance, 
and we expect all traps to be transmitted to the program 
in the same form, regardless of their origin. 

These are all obvious points which are generally 
accepted, and have even become embedded in the 
definition of PL/I. What concerns us here is the re- 
lationship between traps and domains, which is not 
quite so obvious. The basic problem is that the re- 
sponse to a trap must be made to depend on the environ- 
ment in which is occurs. The occurrence of, say, a 
floating overflow is simply a fact, and has nothing to 
do with who is running. The action to be taken, on the 
other hand, is entirely a function of the situation. 
Consider the example in Figure 6. If a floating overflow 
occurs with the call stack in state (b), it is clear that 
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Name 

Domain 

Traps 

A 

Command processor 

CATCHALL 




B 

Statistical 

package 

FLTOV , 1 

SINGMTX 1 
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Inversion 
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b) The call stack 
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inversion 
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© 

CATCHALL 

© 

VV FLTOV 

© 

© 
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the matrix d) 

inverter pro- 
cesses a 
floating over- 
flow 

the matrix 
inverter re- 
turns with 
trap-return 
(SINGMTX) 

e) the matrix 

inverter returns 
with trap- 
return 
(BAD DATA) 


Figure 6 — Traps and trapretums 


C should have the first chance to handle the trap. If 
it is not interested, the domain B which called it should 
have the second chance. In state (c) : , on the other hand, 
domain B should have the first chance, and then A. 
The reasons for this is that we do not wish to give up 
control to a weaker domain when a trap occurs. 

The idea is then the following: Each domain is 
considered to have a father. When a trap occurs, it is 
first directed to the domain S which is running. If S 
does not have the trap enabled, the father of S is 
tried in the same way. If no one can be found to handle 
the trap, there are two possibilities: 


ignore it; 

generate a catchall trap which any domain that 
lacks a father is forced to handle. 


to each hardware-generated trap is a standard name . 
Software-generated traps can use any names, including 
the ones for hardware traps. This makes it easy for a 
subroutine to simulate the occurrence of a hardware 
condition which it may not be convenient to produce. 

A simple extension of the return operation to a 
trap-return allows a routine to signal an error without 
leaving any traces of itself; the trap-return does a 
return and immediately causes the specified trap, 
without allowing any execution beyond the return 
point. The domain which handles the trap then sees 
it as having occurred in the calling routine, which is 
exactly what is wanted. Thus in Figure 6 we have a 
matrix inversion routine which processes its own 
floating overflows, but reflects two other conditions 
to its caller with trap-return. Another useful con- 
vention is to disable the trap when it occurs. This 
makes it much less likely that the program will get 
into a loop, especially for such traps as illegal in- 
struction and memory protection violation. 

Interrupts 

There remains one more way to cause a transfer 
between domains: the occurrence of an interrupt. This 
is not intended to be the normal mechanism for com- 
munication between cooperating processes; the basic 
block and wake-up mechanisms 6 are expected to per- 
form that function. There are times, however, when it 
is desirable to force a process to do something, even 
if it is not paying attention. Two obvious reasons for 
this are : 

a quit signal from the terminal, which indicates 
that the user wants to regain control over a process 
which has gone into a loop, or perhaps simply 
become unnecessarily wordy; 

the elapse of a certain amount of time, which 
has much the same meaning. 


If a domain T is found with the trap enabled, it is 
called with the name of the trap as argument. It can 
then return and allow execution to proceed if it is 
able to clear things up. Alternatively, it can do a 
jump-return to someone farther back on the call stack 
if it finds the situation to be hopeless. An important 
property of this scheme is that the trap routine can do 
arbitrarily complex processing without disturbing the 
situation at the time of the trap. 

Conceptually, we wish to think of traps as identified 
by symbolic names. Each domain must then include a 
list of names of the traps it has enabled. Corresponding 


The action required in these two cases is different. 
When a timer interrupt is requested (and there may be 
two kinds, for real time and CPU time) the desired 
action is usually to call a specific domain, often the 
one which is setting the timer. If another domain 
wants a timer, it will use one which is logically different. 
The user's quit signal, on the other hand, is context 
dependent like a trap; the desired action is a function 
of the routine which is running when the signal arrives. 
Thus an iterative root-finder may interpret a quit as 
an indication that the solution is accurate enough, 
but the debugging system under which it may be run- 
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ning will curtail its printing when it sees a quit and 
await a new command. This analysis suggests a simple 
implementation: convert the quit into a trap from the 
currently executing domain. Each interrupt, then, will 
give rise to a call or a trap, depending on its type as 
declared by the programmer. 

Even when we see how to convert them into oper- 
ations within the process, interrupts still present one 
serious problem which does not arise in the handling 
of traps. This is the fact that a program occasionally 
needs to be allowed to compute for a while without 
losing control. Usually this happens when modifi- 
cations are being made to a data base; if a quit signal 
should appear or a timer run out halfway through this 
operation, the data is left in a peculiar state. The 
obvious solution is to allow a process to become non- 
interruptible for a limited period of time. The function 
of the limit is to prevent the process from getting into 
a state from which it cannot be retrieved; exceeding 
it is a programming error and always causes the process 
to become interruptible again and an error trap to 
occur, regardless of whether an interrupt is actually 
pending. The limit is properly measured in real time, 
since its primary purpose is to put a bound on the 
frustration of the user at his console. 

Non-interruptibility is a process-wide condition. It 
must be possible, however, for a newly-called domain 
to extend the limit exactly once, so that it can function 
properly even though its caller is about to exceed his 
limit. The limit is thus part of a call stack entry. When 
a return occurs, the old limit comes back into force, 
and an immediate trap may occur if it has been ex- 
ceeded. 

Table I summarizes the operations connected with 
transfers of control between domains. 


TABLE I — Operations for transfers 


Operation 


Arguments 


Call 

Return 

Jump 

Jump-return 

Trap 

Trap-return 


Gate, Parameters 
Parameters 
Gate, Parameters 
Depth, Parameters 
Trap number 
Trap number 


Proprietary programs 

The remainder of this paper deals with the pro- 
tection problems introduced when objects are allowed 


to have external, mnemonic names. The examples in 
this section are intended to introduce this subject, and 
are also of interest in their own right. Suppose then 
that a user U has a program executing in domain P 
and wishes to perform a circuit analysis. P has gener- 
ated the input data for the analysis, and intends to 
use the results for further calculation. Within the 
system M on which P is running, some user V has 
written a suitable analysis program A which he has 
offered for sale, and U has decided to use V's program. 
It happens that U and V are competitors. 

Both users in this situation have selfish interests 
to protect. First, and most obvious, V does not want 
his program stolen. He therefore insists that while it 
is executing U must not be allowed to read it. Equally 
important, however, is the fact that U does not want 
V’s program to be able to read the calling program P 
and its data; although U may not be trying to market 
P, it, and especially its data, contain valuable infor- 
mation about U's current development work which 
must be kept from competitors. The relationship 
between U and V, and between their programs P and A, 
is therefore one of mutual suspicion. Each is willing 
to entrust the other with just enough information 
to allow the circuit analysis to be completed, and no 
more. The system must support this requirement if it 
is to be a suitable vehicle for selling programs. 

Furthermore, caie must be taken beyond the pro- 
grams. While P is running it needs the ability to ac- 
cess U's files by name, to read input data and record 
results. This privilege must certainly not be extended 
to A, since it can learn even more about U's secrets 
by examining his files than by looking at his program, 
not to mention the possibility of modifying them. On 
the other hand, A may need access to V's files to obtain 
data for the analysis and to collect statistics and ac- 
counting information; this access must not be available 
to P. The protection mechanisms must therefore pro- 
vide for isolating P and A at the level of file naming as 
well as on the lower levels which have been the subject 
of this paper so far. 

What is required then is a system facility something 
like this. V establishes A as a proprietary program , 
specifying the file on which it resides. Another user's 
program P may then ask the system to attach this 
file. To do this, the system creates a new domain A, 
installs the program in it, provides it with some storage, 
and returns to P a gate into A. When P wants to call 
A, he uses the gate and passes whatever parameters 
he thinks are needed for A to function. When A is 
finished, he returns. The protection mechanisms we 
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have been discussing prevent undesired interference 
between P and A. Safeguards for the files are discussed 
below. 

The example above is one of a great variety of similar 
situations. The system itself creates many of them. A 
LOGOUT command, for example, requires special ac- 
cess to accounting files and to capabilities for destroying 
a process, but it would be nice to call it with the 
standard command processor. Similarly, driving a 
special peripheral like a printer requires special capa- 
bilities. If a company maintains a large data base, it 
may wish to give different classes of users access to 
different parts of it by allowing them to call different 
accessing programs. These and many other applications 
fall within the general outline established by our pro- 
prietary program example. We now proceed to consider 
how to handle the file naming problems it presents. 

External names 

Table II lists the goals of a naming system for objects, 
and indicates some of the distinctions between the 
use of capabilities in names which have been discussed 
in previous sections, and the use of external names, 
which are strings of characters such as ‘FILE1’ or 
‘CIRCUIT’. In summary, it says; that capabilities are 
very convenient for use by a program, since they are 
cheap and self-validating. On the other hand, they are 
very bad for people, since they cannot be typed in or 
remembered. Names for people should also have the 
property that the same name can Irefer to many differ- 
ent objects, the distinctions to be made by context. 
Thus, Smith’s file ‘ALPHA’ is not the same as Jones’ 
‘ALPHA’. 


TABLE II — Goals of a naming system for objects 


Goal 


Achieved by Achieved by 

Capabilities external names 


N ames are mnemonic X 

N ames can be relative X 

to other names 

N ames can be used exter- X 

nally 

Possession of name X 

authorizes access 

N ames are cheap X 

to use 

N ames can be ma- X X 

nipulated by programs 


Techniques for achieving all these goals are well 
known. They depend on the introduction of a new kind 
of object called a directory, which consists of pairs: 
< external name, capability >, and an operation of 
opening an object by supplying the name to obtain 
the capability. Since the external name is interpreted 
relative to a directory, there is a suitable basis for 
establishing the context of a name. A tree-structured 
naming system is implicit in the scheme, because 
directories are themselves objects accessed by capa- 
bilities. It is now easy to see how a program in a domain 
D accesses the objects belonging to owner U. When D 
is created, it is supplied with a capability for TJ’s 
directory, which it simply exercises. 

There is more controversy over the proper methods 
of accessing objects belonging to other users. A popular 
approach is to use passwords: a public read-only 
directory is filled with capabilities for all other directo- 
ries which allow the objects in them to be accessed 
provided a correct password (usually different for each 
object) is supplied as part of the opening operation. 
This method is not satisfactory. First, it is inconvenient, 
since it requires the person accessing the file to re- 
member the password. Second, it is insecure. If he 
writes the password down, or includes it in a program, 
the possibility increases that it will become known. It 
is bad enough to have to use a password to obtain 
entry to the system, but at least only one password is 
involved, it is used only once per session, and it can 
be changed, if need be after each session, without too 
much fuss. None of these things is true of passwords 
attached to files : there are many of them, many people 
need to know^ them, and one must be used each time 
a file is opened. This scheme has no advantage except 
economy of implementation. 

A method based entirely on capabilities suffers only 
one of these drawbacks: it is inconvenient, but secure. 
It is also, however, quite complex. The idea is that if 
a file (or anything else) is to be shared, a capability 
for it should be passed from its owner to those who 
wish to share it. The problem is that a capability, 
being a protected object, must be passed through pro- 
tected channels; it cannot be sent in a letter, even a 
registered letter. The solution is illustrated in Figure 
7. Every user has (at least) two directories, a private 
one which he works with, and a transfer directory. The 
public directory PUB, for which every user has a read 
capability, contains write capabilities for all the trans - 
fer directories. The object is to move the capability 
for X from PDA to PDB . Proceed as follows : 
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Figure 7 — Sharing capabilities without access keys 


A moves a capability for TDB into PDA 

Using it, A moves his capability for X to TDB 

B moves the capability for X from TDB to PDB 

Since only B can access TDB, security is preserved. A 
malicious user can confuse things by writing random 
capabilities into the TDs, but it is easy for B to check 
that he has gotten the right thing. Furthermore, if X 
is a directory, future communication can be carried 
out quite conveniently, since A and B can then com- 
municate through X without any worries about out- 
side interference. 

A much better method is based on the simple idea 
of attaching to a directory entry a list of the users 
who are allowed to access it; with each user we can 
also specify options, so that Rosenkrantz may be 
granted write access to the file while Guildenstern can 
only read it. This scheme, which was first used in 
CTSS, 1 has two drawbacks. The first is that if the list 
of users who are authorized to access a file is long, it 
takes a lot of space to store it ; this problem is especially 
annoying if there are several files to be accessed by the 
same group of users. The second drawback is that there 
is no provision for giving different kinds of access to 
different domains of a computation. Both difficulties 
can be overcome in a rather straightforward manner. 

Before we pursue this point, it is important to notice 
•why the difficulty encountered above in the capability- 
passing scheme does riot arise here. We can think of 
the computation of a logged-in user as possessing a 
special kind of capability which identifies it as be- 
longing to him. If SMITH is the user, we will refer to 
this capability as SMITH*, meaning that the string 


SMITH* 


Capabilities for 
SMITH * s computa- 
tion before opening 
the file. 




Capabilities for 
SMITH ' s computa- 
tion after opening 
the file. 




Figure 8 — Use of access keys 


‘SMITH’ has been enclosed in a tamper-proof box. 
When JONES wishes to give SMITH access to his 
file ALPHA, he puts the name SMITH on the access 
list; JONES can do this since he has a capability for 
ALPHA. When a computation presents the capability 
SMITH*, -the system observes that the string (or user 
number) which is the contents of the capability matches 
the string on the access list and grants the access. 
At no time is it necessary for JONES to have SMITH* 
in his possession. He needs only the name SMITH 
which, since it is not a protected object, can be com- 
municated to him by shouting across the room. Figure 
8 illustrates. 

To generalize the method we need two ideas. One 
is that of an access key . This is an object (i.e., it can 
be referenced only by using a capability) which con- 
sists simply of a bit string of modest length, long 
enough that the number of different access keys is 
larger than the number of microseconds the system 
will be in existence. Any user may ask the system for a 
new access key; the system will create one never seen 
before and return a capability for it. The object SMITH* 
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mentioned in the last paragraph is an example of an 
access key; one is kept for each user in the system. 
Since an access key is an object, capabilities for it 
appear in the directories and are protected exactly as 
is done for any other object (since the access key is a 
small object, it may be convenient for the imple- 
mentation not to give it any existence independently 
of the capabilities for it, i.e., to make the value of the 
capability the object itself, rather than a pointer to 
it as in the case of files). To give a group of users access 
to some files, all we have to do is distribute a new 
access key GROUP* to the users and put GROUP 
on the access list for each file. The distribution is 
accomplished by creating GROUP* and putting all 
the users on its access list; once they have copied it 
into their directories they can be removed from the 
access list, so that no space need be wasted. In practice, 
as we have pointed out, numbers of perhaps 64 bits 
would be used instead of strings like ‘GROUP'. 

The second idea is not new at all. It consists of the 
observation that since an access key is just an object, 
different domains can have different access keys and 
hence different kinds of access to the file system. Thus, 
for example, a user’s computation may be started with 
two domains, one for his program with his name as 
access key, and the other for system accounting with 
an access key which allows it to write into the billing 
files. With a single suitable access key, a domain can 
easily get hold of an arbitrarily large collection of 
other objects which are protected by other keys, since 


the first key can be used to obtain other keys from the 
directory system. 

SUMMARY 

We have described a very general scheme for dis- 
tributing access to objects among the various parts of 
a computation in an extremely specific and flexible 
way. The scheme allows two domains to work together 
with any degree of intimacy, from complete trust to 
bitter mutual suspicion. It also allows a domain to 
exercise firm control over everything created by it or 
its subsidiaries. 
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INTRODUCTION 

In the past decade, many computer systems intended 
for operational use by large military and govern- 
mental organizations have been “custom made” to 
meet the needs of the particular operational situation 
for which they were intended. In recent years, how- 
ever, there has been a growing realization that this 
design approach is not the best method for long term 
system development. Rather, the development of 
general purpose systems has been promoted that 
provide a broad, general base on which to configure 
new systems. The concepts of time-sharing and gen- 
eral-purpose data management have been under de- 
velopment for several years, particularly in university 
or research settings. 1 * 2 ’ 3 These methods of computer 
usage have been tested, evaluated, and refined to 
the point where today they are ready to be exploited 
by a broad user community. 

Work on the Advanced Development Prototype 
(ADP) contract was begun in January 1967 for the 
purpose of demonstrating — in an operational envi- 
ronment — the potential of automatic information- 
handling made possible by recent advances in com- 
puter technology, particularly advances in time- 
sharing executives and general-purpose data manage- 
ment techniques. The result of this work is a large- 
scale, multi-purpose system known as ADEPT, which 


operates on IBM system 360 computers.* 

The entire ADEPT system is now being used at 
four field installations in the Washington, D. C. area, 
as well as at SDC in Santa Monica. The system was 
installed at the National Military Command System 
Support Center in May 1968, at the Air Force Com- 
mand Post in August 1968, and at two other govern- 
ment agencies in January 1969. These four field sites 
collectively run ADEPT from 80 to 100 hours per 
week, providing a total of some 2000 terminal hours 
of time-sharing service monthly to their users. 

The ADEPT system consists of three major com- 
ponents: a time-sharing executive; a data manage- 
ment system adapted from SDC’s Time-Shared Data 
Management System (TDMS) described by Bleier, 4 
and a programmer’s package. This paper deals ex- 
clusively with the ADEPT Time-Sharing Executive, 
and particularly with the more novel aspects of its 
architecture and construction. Before examining these 
aspects it will be instructive if we review the basic 
design and hardware configuration of the system. 

A general purpose operating system 

The ADEPT executive is a general-purpose time- 


* Development of ADEPT was supported in part by the Ad- 
vanced Research Projects Agency of the Department of Defense. 
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sharing system. The system operates on a 360 Model 
50 with approximately 260,000 bytes of core memory, 
4 million bytes of drum memory, and over 250 million 
bytes of disc memory, shown graphically in Figure 
1 and schematically in the appendix. With this machine 
configuration, ADEPT is designed to provide respon- 
sive on-line interactive service, as well as background 
service to approximately 10 concurrent user jobs. It 
handles a wide variety of different, independent ap- 
plication programs, and supports the use of large 
random-access data files. The design— basically a 
swapping system — provides for flexibility and expan- 
sion of system functions, and growth to more powerful 
models in the 360 family. 

ADEPT functions both as a batch processor (where- 
by jobs are accumulated and fed to the CPU for opera- 
tion one by one) and as an interactive, on-line system 
(in which the user controls his job directly in real 
time simply by typing console requests). 

Viewed as a batch system, ADEPT allows jobs to 
be submitted to console operators or submitted from 
consoles via remote batch commands (remote job 
entry). In either case, jobs are “stacked” for execution 
by ADEPT in a first-in/first-out order. The stack is 
serviced by ADEPT as a background task, subject 
to the priorities of the installation and the demands 
of “foreground” interactive users.' Viewed as an inter- 
active system, ADEPT allows the user to work with 
a typewriter, allowing computer-user dialog in real 
time. Via ADEPT console commands, the user iden- 
tifies himself, his programs, and his data files, and 
selectively controls the sequence and extent of opera- 
tion of his job in an ad lib manner. A prime advantage 
of the interactive use of ADEPT is that the system 
provides an extendable library bf service programs 
that permit the user to edit data files, compile or 
assemble programs, debug and eliminate program 
errors, and generally manage large data bases in a 
responsive on-line manner. 

System architecture 

The architecture of the ADEPT executive is that 
of the “kernel and the shell”. The “kernel,” referred 
to as the Basic Executive (BASEX), handles the 
major problems of allocating and scheduling hard- 
ware resources. It is small enough to be permanently 
resident in low core memory, permitting rapid response 
to urgent tasks, e.g., interrupt control, memory al- 
location, and input/output traffic. The “shell,” re- 
ferred to as the Extended Executive (EXEX), provides 
the interface between the user's application program 
and the “kernel”. It contains those non-urgent, large- 



Figure 1 — Relative capacity of various ADEPT direct-access 
storage media available in less than 0.2 seconds. The initial 
system that operates at SDC utilizes core, 2303 drum, 2311 and 
2314 disc packs, and 2302 disc storage. The NMCSSC system 
utilizes 2314 disc storage in lieu of 2311 or 2302 discs. The archi- 
tecture of the ADEPT executive is such that it permits any 
combination of the above types of disc storage in varying amounts 

task extensions of the basic “kernel'* processes that 
are user-oriented rather than hardware-oriented; 
they may, therefore, be scheduled and swapped. 

The version of the ADEPT time-sharing system^ 
thus far developed has multiple levels of control 
beyond the two-level “kernel-shell** structure — i.e., 
it can be thought of figuratively as an “onion skin”. 
Figure 2 shows these relationships graphically. 

Beyond EXEX, “object systems** may exist as 
subsystems of ADEPT (developed by the user com- 
munity without modification to EXEX or BASEX), 
thus further distributing and controlling the system 
resources for the object programs that form still 
another level of the system. The design ideas embodied 
in ADEPT parallel those of Dijkstra, 6 Corbato, 6 
and Lampson, 7 but differ in techniques of implemen- 
tation. 

The ADEPT Basic Executive operates in the lower 
quarter of memory, thereby providing three quarters 
of memory for user programs. With the current H 
core configuration, ADEPT preempts the first 65,000 
bytes of core memory, the bulk of which is dedicated 
to BASEX; EXEX must then operate in user memory 
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in a fashion similar to user programs. ADEPT is 
designed to operate itself and user programs as a 
collection of 4096-byte pages. BASEX is identified 
as certain pages that are fixed in main storage and 
that cannot be overlayed or swapped. EXEX and 
other programs are identified as sets of pages that 
move dynamically between main storage and swap 
storage (i.e., drum). It is necessary to maintain con- 
siderably more descriptive information about these 
swappable programs than about BASEX. This 
descriptive information is carried in a set of system 
tables that, at any point in time, describe the current 
state of the system and each program. 

ADEPT views the user as a job consisting of some 
number of programs (up to four for the 360/50H 
configuration) that were loaded at the user's reouest. 
These programs may be independent of one another 
or, with proper design, different segments of a larger 
task. Implicitly, EXEX is considered to be one of 
these programs. To simplify system scheduling, com- 
munication, and control, only one program in the 
user's set may be active (eligible to run) at a time. 
When ADEPT scheduling determines that a job may 
be serviced, the current job in core is saved on swap 
storage, and the active program of the next job is 
brought into core from swap storage and executed 
for a maximum period of time, called a quantum. The 
process then repeats for other jobs. Figures 3 and 4 
schematically depict these relationships. 


Figure 3 — Simple commutation of users programs. This figure 
illustrates the relationship between user’s programs’ EXEX 
and BASEX. Each spoke represents a user’s job, with his EXEX 
providing the interface between BASEX and the hardware 
resources. The maximum number of interactive job the 
IBM 360/50H configuration is ten. 



Figure 4 — ADEPT’S basic sequence of operation. This figure 
shows the basic operating system cycle: idle loop is interrupted 
by an external interrupt (an activity request); a program is 
scheduled, swapped into core from the drum, and executed 
escape from the execution phase occurs when quantum termina- 
tion condition (eg., time expiration, service or I/O call, error 
condition) is met; the program is then swapped out and control 
is returned to the idle loop (if no other programs are eligible to 
be scheduled). 

Basic executive (BASEX) 

Table I lists the BASEX components and their 
general functions as of the eighth and latest executive 
release. These basic system components form an 
integrated, non-reentrant, non-relo eatable, perma- 
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nently-resident, core memory package 16 pages long 
(each page is 4096 bytes). They are invoked by hard- 
ware interrupts in response to service requests by 
users of terminals and their programs. Note the 
division of input/output control into cataloged (SPAM 
and IOS), terminal (TWRI), and drum (BXEC) 
activities to permit local optimization for improved 
system performance. 


TABLE II — Extended executive components 


Component 

AUDIT 

BMON 


Function 

Maintains a real-time recording of all 
security transactions as an account- 
ability log. 

Batch monitor for control of back- 
ground job execution. 


TABLE I — Basic executive components CAT 


Component 

ALLOC 

BXBUG 

BXEC 

BXECSVC 


Function 

Drum and core memory allocation. 

Debugger for executive programs. 

Basic sequence and swap control. 

SVC handlers for WAIT, TIME, 
DEVICE, STOP AND DISMISS 
calls. 


EXEX 

INTRUP 

IOS 

RECORD 

SEED 

SPAM 

TWRI 


Linkage routines for EXEX (BASEX/ 
EXEX interfaces) ; also services com- 
mands DIALOFF, DIALON. 

First-level interrupt control. 

Channel-program level input/output 
supervisory control. 

Records SVC, interrupt activity in 
BASEX. 

Scheduler. 

Input/output access methods to cata- 
loged storage. 

Terminal input/output control. 


System Tables Resident system data areas for com- 
munication table (COMTAB), logged- 
in user's table (JOB), loaded programs 
table (PQU), drum and core status 
tables (DSTAT, CSTAT), and a 
variety of other tables. 


DTD 

DBUG 

LOGIN 

SERVIS 


RUN 

XXTOO 


SYSDEF 

SYSLOG 


Extended executive (EXEX) TEST 

Unlike the tight, closed package of integrated 
BASEX components, EXEX is a loose, open-ended SYSDATA 
collection of semiautonomous programs. Table II 
lists this collection of programs. EXEX is treated 
by BASEX as a user program, with certain privileges, 
and each user is given his own “copy” of the EXEX. 

It is transparent to the user that EXEX is reentrant 


Cataloger for file storage access con- 
trol; also services FORGET command. 

Transfers recording information from 
drum to disc. 

Debugger for non-executive (user) 
programs. 

User authentication and job creation. 

Library of service commands that are 
reentrant, interruptible and scheduled: 
APPEND, CHANGE, CREATE, 
CYLS, DELETE, DRIVES., INIT, 
LISTF, LISTU, LOAD, LOADD, 
LOAD and GO, OVERLAY, RE- 
PLACE, RESTORE, RESTORED, 
SAVE, SEARCH, VARYOFF, 
VARYON. 

Remote batch job submission control 
servicing commands RUN and 
' CANCEL. 

Library of small, fast, executive 
service commands: CPU, BGO, 
BQUIT, BSTOP, DIAL, DRUMS, 
GO, LOGOUT, QUIT, RESTART, 
SEED, SEEDOFF, STATUS, 
STOP, TIME, USERS. 

Defines input/output hardware con- 
figuration at time of system start up. 

Defines authorized user/terminal se- 
curity profiles at time of system 
start up. 

Initializes system tables at time of 
system start up. 

Non-resident, shared, system data 
table for dial messages and other 
common data, e.g., lists of all logged-in 
users; other non-resident, job-specific 
tables also exist, e.g., job environment 
page* push-down list data page. 
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and is being shared with other users, except for its 
data space. Each job has its own “machine state” 
tables saved in its unique set of environment pages. 
This structure permits flexible modification and orderly 
system expansion in a modular fashion. EXEX is 
always scheduled in the same way as other user pro- 
grams. 

Though EXEX components are, in large part, 
non-self-modifying reentrant routines and thus, could 
at small cost, be relocatable; neither user programs 
nor EXEX components are relocated between swaps. 
The lack of any mapping hardware on the IBM 360/50 
and the design goal and knowledge that most user 
programs would be of maximum size made unnecessary 
a software provision to relocate programs dynamically. 
User programs may be relocated once at load time, 
however. 

Communication and control techniques used in ADEPT 

Communication is the generic term used to cover those 
services that permit two (or more) programs to inter- 
communicate, be they system program, user program, 
or both. From this communication vantage point we 
shall examine the connective mechanism used between 
the Basic and Extended Executives; the techniques 
that allow components within the EXEX to make 
use of one another; and the system design that permits 
an object program to control its own behavior as well 
as to communicate with the system and with other 
object programs. 

The ADEPT job or process 

Before we discuss the system mechanics, let us 
examine how the system treats each user logically. 
A user in the system is assigned a job number. Each 
job in the system may be viewed as a separate process , 
and each process is, by definition, independent of all 
other processes running on the machine. A process — 
or job — is not a program. It is the logical entity for 
the execution of a program on the physical processor, 
and it may contain as many as four separate programs. 
A program consists of the set of machine instructions 
swapped into the processor for execution, and the 
Extended Executive is one of these programs. 

The ADEPT executive requires a large number of 
system tables to permit Basic and Extended Execu- 
tive communication. Conceptually, the use of descrip- 
tive tables defining the condition of a user’s process 
is analogous to the state vector (or state word) dis- 
cussed by Lampson and Saltzer. 8 ’ 9 That is, the col- 
lection of information contained by these tables is 


sufficient to define an inactive user’s process state 
at any given moment. By resetting the central proc- 
essor from the state vector, a user’s job proceeds 
from an inactive to an active state as if no interrup- 
tion had occurred. The state vector contains such 
items as the program counter, the processor’s general 
registers, the core and drum map of all the programs 
in the job, and the peripheral storage file data. All 
of the collective data for each program or task in the 
process are contained in the state vector. 

Basic and extended executive communication 

Each ADEPT user (i.e., any person who initiates 
some activity within the system by typing in com- 
mands) is given a job number and assigned an entry 
in the JOB table. The JOB table contains the system’s 
top-level bookkeeping on user activity. It contains 
the user’s identification, his location, his security 
clearance, and a pointer to his program queue. Each 
user is assigned one entry, or JOB, in the table. As- 
sociated with each JOB are the one or more programs 
that the user is running. 

Top-level bookkeeping on programs is contained 
in the Program Queue (PQU) table. Each PQU entry 
contains a program identification and some (but not 
all) information that describes that program in terms 
of its space requirements, its current activity, its 
scheduling conditions, and its relationship to other 
programs in the PQU that belong to the same JOB. 
The detailed descriptive information and the status 
of each JOB and its programs are carried in the swap- 
pable environment space. 

The environment pages (there can be as many as 
four) comprise a number of separate tables that con- 
tain such information as the contents of the general 
registers, the swap storage page numbers where the 
balance of the program resides, the program map, 
and lists of all active data files. A single environment 
page (or pages) is shared by all programs that belong 
to the same JOB (user). The system design allows for 
environment page overflow at which time additional 
pages are assigned dynamically. The environment 
pages, PQU table, JOB table, and data pages com- 
prise the state vector of the user’s job. 

To permit storage of “global” system variables, 
and to allow system components to reference system 
data that may be periodically relocated, there exists 
a system communication table, which resides in low 
core so that it can be referenced without loading a 
base register. 

The IBM 360 supervisor call (SVC) is used exclu- 
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sively by EXEX components and object programs to 
request BASEX services. Though additional overhead 
is incurred in the handling of the attendant interrupt, 
the centralization of context switching provided is 
of considerable value in system design, fabrication, 
and checkout. 

Extended executive communication 


the stack, whether to return to a still higher level 
EXEX function, or to turn the EXEX off and exit 
to the Basic Sequence. This recursive/reentrant con- 
trol is the most complex portion of ADEPT and is 
the “glue” that binds BASEX and EXEX together. 
Figure 5 illustrates the recursive process. 

Object program communication 


An EXEX may make use of another EXEX func- 
tion by use of the SVC call mechanism. To support 
the recursive EXEX, an additional SVC processing 
routine is required to manage the different recursive 
contexts. This routine, called the SVC Dispatcher, 
processes calls from user and EJCEX functions alike, 
manages a swappable data page, and switches to an 
interface linkage routine. The data page contains 
a system communication stack that consists of a 
program’s general registers and the Program Status 
Word at the time of the SVC. This technique is 
analogous to the push-down logic of recursive pro- 
cedure calls found in ALGOL or LISP language 
systems. The stack provides a convenient means of 
passing parameters between routines in the EXEX. 
Since each job has its own unique data page and en- 
vironment page, EXEX is both recursive and reentrant. 

The environment status table (ESTAT) contains 
the swap and core location for each component in 
the EXEX and for each program in the job. It resides 
in the job environment page. When an EXEX service 
is requested, only that particular EXEX program is 
brought in from swap storage,; rather than the full 
service library. The interface linkage routine provides 
this management function; it lies as a link between 
the SVC Dispatcher and the particular EXEX 
function. The interface routine picks up necessary 
work pages for the EXEX component involved and 
branches to that component after it is brought into 
core. The interface routine maintains a separate push- 
down stack of return addresses: providing the means 
for the EXEX component to properly exit and return 
control to its interface routine and then to the system. 

The EXEX component called may make additional 
EXEX SVC calls before exiting. To provide correct 
work page allocation during recursive calls, the inter- 
face routine also saves the work page core and drum 
page addresses in the push-down stack. Upon com- 
pletion of a call, the EXEX component returns to 
its interface routine; the interface routine releases 
all allocated work pages to the system and branches 
to a common unwind procedure. 

The unwind procedure, like the SVC Dispatcher, 
is simply a switching mechanism. It determines, via 


One of the more stringent services required of an 
operating system is the rapid interchange of large 
quantities of data between object programs. The 
interchange of even simple arrays, matrices, and tables 
via stack parameters or a common file suffers from the 
inadequacy of limited capacity or extensive I/O time. 
Many operating systems ignore this requirement, 
thereby restricting the general-purpose applications. 
Yet there are solutions to this problem, and one suc- 
cessful technique employed in the ADEPT system is 
that of “shared memory”. Shared memory is achieved 
by using the basic mechanism for managing reentrancy, 
namely the program environment page map. Through 
the ADEPT SHARE Page call, an object program 
can request that designated pages of another program 



Figure 5 — Block diagram of EXEX behavior and 
control 
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in the job be added to its map. If core page numbers 
are passed as parameters in various service calls, whole 
pages of data may be passed between programs. EXEX 
and many object programs operating under this system 
use this method for inter-program communication. 

ADEPT operating on the IBM 360/50H restricts 
its user programs to 46 active core pages. However, 
by utilizing the GETPAGE call, an object program 
may acquire up to 128 drum pages and may subse- 
quently activate and deactivate various page sets 
by utilizing another service call, ACTDEACT (acti- 
vate/ deactivate). This scheme permits bulk data from 
disc storage to be placed on drum and operated upon 
at “swap” speeds. Thus skilled system users can 
achieve efficient use of time and memory by managing 
their own “paging”. We consider this the best alterna- 
tive considering the questionable state of other, auto- 
matic paging algorithms. 10 - 11 - 12 ' 13 Most EXEX com- 
ponents use these calls for just such purposes. For 
example, the interface routines mentioned above use 
activate calls to “turn on” called components of the 
EXEX. 

The Allocator component of ADEPT manages the 
page map for each program. This software map re- 
flects the correspondence between drum and core 
pages, established initially by the SEItYIS (service) 
component at load time. The Allocator's function is 
to inventory available core and drum pages by main- 
taining two resident system tables: one for core, the 
other for drum. Whenever drum pages are released 
or obtained, the Allocator updates the page map in 
the job's environment page. The Allocator processes 
the SHARE (page), GETPAGE, FREEPAGE, and 
ACTDEACT calls from EXEX and object programs. 
SERVIS allows a program at run time to add data 
pages or to overlay program segments from disc or 
tape. In so doing, SERVIS makes use of the various 
Allocator calls. 

Simulating console commands 

An important attribute of ADEPT time-sharing 
is that nearly all the functions and services that can 
be initiated at the user's console can also be called 
forth within a user's program. A program designer 
can, for example, build a system of programs, which 
can operate in batch mode under the control of a pro- 
gram by issuing internal commands in much the same 
manner as the user sitting at the console. With this 
approach, the ADEPT batch monitor controls back- 
ground tasks by simulating user terminal requests. 
Batch requests can be enqueued by users from any 


console and then processed in turn by this supervisor 
function. 

Armed interrupts and rescue function 

The basic design of ADEPT conveniently provides 
for processing object program “armed” interrupt 
calls. This means that an object program is able to 
conditionally start (wakeup) and stop (sleep) the 
execution of its own programs, and others as well. 
The conditions for employing wakeup calls include 
too much elapsed time, or the occurrence of unpre- 
dictable but anticipated events, e g., errors and other 
program calls. In “arming” these “software-inter- 
rupt” conditions by object program calls, the program 
entry point (s) for the various conditions are specified. 
When such conditions occur, the operating system 
transfers to the specified entry point and gives the 
appropriate condition code. (Note that if we take this 
call one step further, and permit one object program 
to arm the software and hardware interrupts of another 
object program, we have the basic control mechanism 
necessary to permit the operation of “object systems, 
necessary to permit the operation of “object systems,” 
i.e., subexecutives — another level in the “onion skin” 
of ADEPT control.) 

User programs interface with the ADEPT system 
primarily via the supervisor call (SVC) instruction; 
a secondary interface is provided via the program 
check interrupt that protects the program and system 
after various error conditions. The executive design 
allows user programs to trap all such interfaces with 
the system via its rescue arming mechanism. This 
means that one program can trap and get first-level 
control of all occurrences of SVC's and program checks 
within a single job. This mechanism also means, then, 
that the responsibility and meaning for these inter- 
faces can be redefined at the user program level. 

As of this writing, this mechanism is being employed 
to construct object systems for an improved batch 
monitor, an interface for the proposed ARP A Net- 
work, 14 and to experiment with automatic translators 
for compatibility with other operating systems. Other 
uses include improvements in program recovery in 
a variety of user tools, e.g., compiler diagnostics. 

Resource allocation , access , and management 

ADEPT system design, of course, includes a com- 
plete set of resource controls that monitor secondary 
storage devices. 
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The cataloger 

The Cataloger, an EXEX component, is functionally 
analogous to the core/drum Allocator, but is used 
for devices accessible by user programs. It maintains 
an inventory of all assignable storage devices, assigns 
unused storage on the devices, maintains descrip- 
tions of the files placed on these devices, controls 
access to these files, and — upon authorized request — 
deletes any file. Specifically, the Cataloger: 

• Assigns storage on 2302, 2311 and 2314 discs. 

• Assigns tape drives. 

• Locates an inventoried file by its name and cer- 
tain qualifiers that uniquely identify the file, 

•Issues tape or disc pack mounting instructions 
to the operator when necessary. 

• Verifies the mounting of labeled volumes. 

• Passes descriptive information to the user pro- 
gram opening a file. 

• Allows the user of a file to request more storage 
for the file. 

. Denies unauthorized users access to files. 

• Returns assigned storage to available storage 
whenever a file is deleted. 

• Maintains a table of contents on each disc volume. 

As the largest single component of the ADEPT 
Eexcutive (65,000 bytes), the Cataloger was written 
in a new, experimental programming language called 
MOL-360 (Machine-Oriented Language for the 360). 16 
It is a “higher-level machine language” developed 
under an ARPA-sponsored SDC research project on 
metacompilers. It resolved the dilemma involving 
our desire for higher-level source language and our 
need to achieve flexibility with machine code. The 
Cataloger design and checkout, enhanced by the use 
of MOL-360, showed simultaneously the validity 
of MOL compilers for difficult machine-dependent 
programming. 

The SPAM component 

SPAM is a BASEX component that permits sym- 
bolic, user-oriented I/O. It can be viewed as a special- 
purpose compiler that compiles symbolic user program 
I/O calls into 360 channel programs, and delivers them 
to the Input/Output Supervisor (IOS) for execution 
via the EXCP (execute channel program) call. The 


results of EXCP for the call are “interpreted” by 
SPAM and returned to the user program as status in- 
formation. As such, SPAM represents a more symbolic 
I/O capability than the EXCP level. It provides a 
relatively simple method for executing the operations 
of reading, writing, altering, searching for, and po- 
sitioning records within ADEPT cataloged and con- 
trolled disc-based and tape-based file structures. 

Resource management 

As of this writing, the computer operator has a set 
of commands at his disposal that allow him to control 
the system resources. Various privileged on-line com- 
mands enable him to monitor the terminal activities 
of system users and to control assignment and availa- 
bility of storage devices. However, there is an in- 
creasing need for a “manager” to be given more 
latitude in dynamically controlling the system re- 
sources and observing the status of system users, 
particularly because ADEPT was designed to handle 
sensitive information in classified government and 
military facilities. To meet these objectives, a design 
effort is under way that gives the computer operator 
system-manager status, with the ability to observe 
and control the actions of system users. The result 
will be a program that encompasses some of the man- 
agement techniques reported by Linde and Chaney 16 
tailored to present needs. 

Swapping and scheduling user programs 

Most of the programs that run under ADEPT 
occupy all of the core memory that is not used by 
the resident Basic Executive (46 pages on the 360/ 
50H). If the set of needed pages could be reduced 
considerable reduction in swap overhead could be 
expected. One way to achieve this is to mark for swap- 
out only those pages that were changed during pro- 
gram execution. The hardware needed to automatically 
mark changed pages is unavailable for the 360/50; 
however, through use of the store-protect feature on 
the Model 50, ADEPT software can simulate the ef- 
fect and produce noteworthy savings in swap time. 

Page marking 

Whenever a user program is swapped into core, its 
pages are set in a read-only condition. As the program 
executes, it periodically attempts to store data (write) 
in its write-protected pages. The resulting interrupt 
is fielded by the system. After satisfying itself that 
the store is legal for the program, the executive marks 
the target page as “written,” turns off write-protect 
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for that page, and resumes the program’s execution. 
The situation repeats for each additional page written. 
At the completion of the program’s time slice, the 
swapper has a map of all the program pages that 
were changed (implied in the storage keys with no 
write protection). Only the changed pages are swapped 
out of core. Measurement of this scheme shows that 
about 20 percent of the pages are changed; hence, 
for every five pages swapped in, only one need be 
swapped out, for a total swap of six pages, rather 
than the full swap of ten pages (five in, five out). The 
scheme makes the drum appear to be 40 percent faster. 

The use of the storage protection keys is based on 
the functional status of each page rather than on 
some user identity. User programs always run with 
a program status word key of one, and the bits in 
the storage key associated with the programs start 
out at zero. After a page has been initially changed, 
its key is set to one also. The other bits in the key are 
used to indicate : first, a page is transient, not yet 
completely moved to or from swap storage; second, 
a page is unavailable, i.e., it belongs to someone else; 
third, a page is locked and cannot be swapped or 
changed; and finally, a page is fetch-protected because 
it may contain sensitive information. 

Scheduling algorithm 

The scheduling algorithm provides for three levels 
of scheduling. Jobs that are in a “terminal I/O com- 
plete” state get first preference in the schedule. Jobs 
in the second level, or background queue, are run if 
there are no level-one jobs to run. A job is placed in 
level two when the two-second quantum clock alarm 
terminates its operation two consecutive times. Com- 
pute and I/O-bound programs are treated alike. A 
level-two job — when allowed to run — is given quantum 
interval equal to the basic quantum time multiplied 
by the scheduling level (i.e., 2 sec X 2 = 4 sec). 
However, a level-two background job may be pre- 
empted after two seconds for terminal I/O. Any opera- 
tion a level-two job makes that terminates its quan- 
tum prematurely will return the job to a level-one 
status. The batch monitor job is run when the first 
two queues are empty. User programs may be written 
to overlap execution and I/O activity. Our choice of 
scheduling parameters for quantum size, and num- 
ber of service levels was selected empirically and as a 
result of prior experience. 17 

A command SKED, which is limited to the opera- 
tor’s terminal, has the effect of forcing top priority 
for a job (the job stays at level one all the time). Only 


one job may run in this privileged scheduling state 
at a time. 

Pervasive security controls 

Integrated throughout the ADEPT executive are 
software controls for safeguarding security-sensitive 
information. The conceptual framework is based 
upon four “security objects”: user, terminal, file, 
and job. Each of these security objects is formally 
identified in the system and is also described by a 
security profile triplet: Authority (e.g., TOP SE- 
CRET, SECRET), Need-to-Know Franchise, and 
Special Category (e.g., EYES ONLY, CRYPTO). 
At system initialization time, user and terminal 
security profiles are established by security officers 
via the system component SYSLOG. SYSLOG also 
permits the association of up to 64 passwords with 
each user. At LOGIN time, a user identifies himself 
by his unique name, up to 12 characters, and enters 
his private password to authenticate his identity. The 
LOGIN component of ADEPT validates the user 
and dynamically derives the security profile for the 
user’s job as a complex function of the user and ter- 
minal security profiles. The job security profile is 
used subsequently as a set of “keys,” used when access 
is made to ADEPT files. The file security profile is 
the “lock” and is under control of the file subsystem. 

File access Need-to-Know is permitted for Private, 
Semi-Private, and Public use. With the CREATE 
command, a list of authorized users and the extent of 
their access authorization (i.e., read-only, write-only, 
read and write) can be established easily for Semi- 
Private files. Newly created files are automatically 
classified with the job’s “high water mark” security 
triplet — a cumulative security profile history of the 
security of files referenced by the job. Through ju- 
dicious use of the CHANGE command, these proper- 
ties may be altered by the owner of the file. 

Security controls are also involved in the control 
of classified memory residue. Software and hardware 
memory protection is extensively used. Software 
memory protection is achieved by interpretive, le- 
gality checking of memory bounds for I/O buffer 
transfers, legality checking of device addresses for 
unauthorized hardware access, and checks of other 
user program attempts to seduce the operating system 
into violating security controls. 

The hardware protection keys are used to fetch- 
protect all address space outside the user program and 
data area. Also, newly allocated space to user programs 
is zeroed out to avoid classified memory residue. 
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for that page, and resumes the program's execution. 
The situation repeats for each additional page written. 
At the completion of the program's time slice, the 
swapper has a map of all the program pages that 
were changed (implied in the storage keys with no 
write protection) . Only the changed! pages are swapped 
out of core. Measurement of this scheme shows that 
about 20 percent of the pages are changed; hence, 
for every five pages swapped in, jonly one need be 
swapped out, for a total swap of six pages, rather 
than the full swap of ten pages (five in, five out). The 
scheme makes the drum appear to be 40 percent faster. 

The use of the storage protection keys is based on 
the functional status of each page rather than on 
some user identity. User programs always run with 
a program status word key of one, and the bits in 
the storage key associated with the programs start 
out at zero. After a page has beeii initially changed, 
its key is set to one also. The other bits in the key are 
used to indicate: first, a page is transient, not yet 
completely moved to or from swap storage; second, 
a page is unavailable, i.e., it belongs to someone else; 
third, a page is locked and cannot be swapped or 
changed; and finally, a page is fetch-protected because 
it may contain sensitive information. 

Scheduling algorithm 

The scheduling algorithm provides for three levels 
of scheduling. Jobs that are in a “terminal I/O com- 
plete" state get first preference in the schedule. Jobs 
in the second level, or background queue, are run if 
there are no level-one jobs to run. A job is placed in 
level two when the two-second quantum clock alarm 
terminates its operation two consecutive times. Com- 
pute and I/O-bound programs are treated alike. A 
level-two job — when allowed to run— is given quantum 
interval equal to the basic quantum time multiplied 
by the scheduling level (i.e., 2 sec X 2 — 4 sec). 
However, a level-two background job may be pre- 
empted after two seconds for terminal I/O. Any opera- 
tion a level-two job makes that terminates its quan- 
tum prematurely will return the job to a level-one 
status. The batch monitor job is run when the first 
two queues are empty. User programs may be written 
to overlap execution and I/O activity. Our choice of 
scheduling parameters for quantum size, and num- 
ber of service levels was selected empirically and as a 
result of prior experience. 17 

A command SKED, which is limited to the opera- 
tor's terminal, has the effect of fojrcing top priority 
for a job (the job stays at level one all the time). Only 


one job may run in this privileged scheduling state 
at a time. 

Pervasive security controls 

Integrated throughout the ADEPT executive are 
software controls for safeguarding security-sensitive 
information. The conceptual framework is based 
upon four “security objects": user, terminal, file, 
and job. Each of these security objects is formally 
identified in the system and is also described by a 
security profile triplet: Authority (e.g., TOP SE- 
CRET, SECRET), Need-to-Know Franchise, and 
Special Category (e.g., EYES ONLY, CRYPTO). 
At system initialization time, user and terminal 
security profiles are established by security officers 
via the system component SYSLOG. SYSLOG also 
permits the association of up to 64 passwords with 
each user. At LOGIN time, a user identifies himself 
by his unique name, up to 12 characters, and enters 
his private password to authenticate his identit}^. The 
LOGIN component of ADEPT validates the user 
and dynamically derives the security profile for the 
user's job as a complex function of the user and ter- 
minal security profiles. The job security profile is 
used subsequently as a set of “keys," used when access 
is made to ADEPT files. The file security profile is 
the “lock" and is under control of the file subsj^stem. 

File access Need-to-Know is permitted for Private, 
Semi-Private, and Public use. With the CREATE 
command, a list of authorized users and the extent of 
their access authorization (i.e., read-only, write-only, 
read and write) can be established easily for Semi- 
Private files. Newly created files are automatically 
classified with the job's “high water mark" security 
triplet — a cumulative security profile history of the 
security of files referenced by the job. Through ju- 
dicious use of the CHANGE command, these proper- 
ties may be altered by the owner of the file. 

Security controls are also involved in the control 
of classified memory residue. Software and hardware 
memory protection is extensively used. Software 
memory protection is achieved by interpretive, le- 
gality checking of memory bounds for I/O buffer 
transfers, legality checking of device addresses for 
unauthorized hardware access, and checks of other 
user, program attempts to seduce the operating system 
into violating security controls. 

The hardware protection keys are used to fetch- 
protect all address space outside the user program and 
data area. Also, newly allocated space to user programs 
is zeroed out to avoid classified memory residue. 



The ADEPT-50 Time-Sharing System 49 


Typically, the complete system reaches “on the air” 
status in less than a minute. 

System instrumentation 

Many of the parameters built into the scheduling 
and swapping of early ADEPT versions were based 
upon empirical knowledge. The latest versions of 
the Basic and Extended Executives include routines 
to record system performance, reliability, and security 
locks. 

Built into the BASEX is a routine to measure the 
overall and the detailed system performance. 20 Such 
factors as the number of users, file usage, hardware 
and software errors, and page transaction response 
time are recorded on unused portions of the 2303 
drum. These measurements provide a better under- 
standing of the system under a variety of inputs and 
give the designers insight into how the hardware and 
software components of the system affect the per- 
formance of the human user. 

An AUDIT program was made part of the EXEX 
to record the security interaction of terminals, users, 
and files. AUDIT records EXEX activity in the areas 
of LOGIN, LOGOUT, and File Manipulation. This 
routine strengthens the security safeguards of the 
executive. Specific items that are recorded involve: 
type of event, user identification, user account num- 
ber, job security, device identification, time of event, 
file identification, file security and event success. In 
addition, this routine provides accounting informa- 
tion and is used as a means of debugging the security 
locks of new system releases. 

In addition to the BASEX recording function, 
several object programs have been written that simu- 
late various modes of user activity and provide con- 
trolled job distributions. These programs, called 
“benchmarks,” run under controlled conditions and 
enhance the means of improving system performance 
and throughput, as described elsewhere by Karush, 21 
The programs are designed to gather performance 
measures on the major routines of the executive and 
have been of considerable help in system “tuning,” 
because they reflect the effect of coding and design 
changes to various system routines. The routines in 
the executive that are of primary concern are the 
swapper, the scheduer, the terminal read/write pack- 
age, and the interrupt handling processes. Attempts 
are being made to design a set of benchmarks that 
represent a typical job mix. However, we are primarily 
interested in measuring the performance of our system 
against various modifications of itself and in measuring 
its behavior with respect to different job mixes. 


SUMMARY 

The ADEPT executive is a second-generation, general- 
purpose, time-sharing system designed for IBM 360 
computers. Unlike the monolithic systems of the past, 1 - 2 
it is structured in modular fashion, employing distrib- 
uted executive design techniques that have permitted 
evolutionary development. This design has not only 
produced a flexible executive system but has given the 
user the same facilities used by the executive for 
controlling the behavior of his programs. ADEPT'S 
security aspects are unique in the industry, and the 
testing and fabrication methods employ a number 
of novel approaches to system checkout that con- 
tribute to its operational reliability. 

It is important to note that this system deals par- 
ticularly well with size limitation problems of very 
large files and very large programs. The provisions 
made for multiple programs per job, active/inactive 
page status for programs larger than core size, page 
sharing between programs, common file access across 
programs within jobs, and the commitment of con- 
siderable space to active file environment tables (up 
to four pages worth) contribute to this success. Never- 
theless, all these capabilities are designed to handle 
the smaller entities as well. We feel ADEPT-50 is 
a significant contribution to the technology of general- 
purpose time-sharing. 
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INTRODUCTION 

The real-time digital process control system, of which 
the Partition Share Supervisor is an operational feature, 
was designed and implemented to assist in the functions 
of monitoring, evaluating and controlling an inter- 
connected system of electrical power utility com- 
panies. The main processing unit is located at the 
central control office with teleprocessing communi- 
cations to remote lower level control centers. 

The basic addressable unit within the main processor 
is the byte (8 data bits + 1 parity bit), with a word 
consisting of four bytes. There is a storage protect 
option which is implemented through assignment of 
storage and “keys” to contiguous 2048 byte blocks of 
memory. A group of memory blocks with matching 
protect keys comprise a partition or task area. This 
protection feature permits non destructive read-out 
across partition boundaries but will cause termination 
of any task which attempts to write in another task’s 
memory area. 

The arithmetic-logic unit maintains its current status 
in a program status word which contains such infor- 
mation as whether or not I/O is currently being per- 
mitted on each of the data channels, the protect key for 


the instruction presently being executed, present 
machine status, length of current instruction, the ad- 
dress of the next instruction to be fetched, etc. There 
are certain instructions within the instruction set 
which can only be executed when the machine is in 
the “supervisor” state, i.e., when the portion of the 
program status word which indicates machine status 
is correctly set. These instructions are classified as 
“privileged” instructions and perform such functions 
as disabling data channel interrupts, altering storage 
keys, resetting the program status word, etc. 

The ability of the computer to disallow certain of 
its instructions when operating in the normal problem 
program state prevents inadvertent destruction of 
critical storage area or catastrophic conditions being 
caused by problem programs which could lead to 
system shutdown. 

This system utilizes the independent I/O channel 
concept which permits the main processor to continue 
execution of program instructions while the channel 
transfers data from I/O devices into main storage by 
cycle interleaving. 

The multi-tasking capability of the manufacturer 
supplied software support system permits priority 
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scheduling of several tasks all utilizing the resources of 
one processing unit. The design of the real-time control 
system requires that it perform certain of its functions 
in a cyclic basis. Therefore, the internal storage has been 
divided into four task areas (partitions) with time de- 
pendent and critical programs placed in partitions 
with relatively higher priorities. The following task de- 
scriptions are listed in order of task priorities: 

Task 1 (core requirement) = 42K) 

i 

Task 1 is dedicated to the manufacturer supplied 
operating system (O/S) which Contains supervisory 
routines, data management routines priority scheduler, 
etc. 

Task 2 (core requirement = 72K) 

Task 2 incorporates the process control family of 
programs. It also includes the remote typewriter/caTd 
reader communications programs since they use little 
processing time and benefit from boith the independence 
of input/ output channel operations and quick response 
time available to the task. During power system 
emergency situations, Task 2 additionally initiates 
routines which, due to their critical nature, retain 
system resources and dispatch eiitergency communi- 
cations until the disturbance is relieved. 



Task 3 (core requirement = 40K) 

Task 3 contains special digital console message pro- 
cessing routines, text output generators for programs 
operational within T ask 2, routines for processing card 
inputs from the telecommunications system and rou- 
tines which monitor and control inter-task communi- 
cations. 

Task 4 (core requirement = 6K) 

Task 4 is the Partition Sharei Supervisor (PSS) 
which causes Tasks 5 and 6 to share the remaining 
available memory. The detailed description of this 
task is the subject of this paper. 

Task 5 (core requirement = 96K) 

Task 5 consists primarily of scientific application 
programs. These programs are run as required either on 
special demand from real-time on-line tasks or peri- 
odically with the length of the period depending on 
the nature of the program. 


Figure 1 — Initial memory configuration with task 
functional descriptions and relative locations shown 

Task 6 (core requirement — 96K) 

This task is the off-line* task and is dedicated for 
miscellaneous uses such as compiles, assemblies, ac- 
counting routines, etc. 

Figure 1 is a functional diagram of the tasks just 
discussed and shows their relative locations in com- 
puter memory. 

General discussion 

Task dispatching 

Task dispatching is under the control of the oper- 
ating system. From a conceptual standpoint, the 
operating system can be considered to be the only 
main program in storage and all other tasks within 
the computer as subroutines. 

* The term off-line is used in this paper when referring to tasks 
which do not directly operate within the real-time environment. 
This use is similar to the term “background” which the reader 
may have previously encountered. 
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The dispatching function consists of allocating the 
resources of the processor to the highest priority task 
which is in the “ready” state. When no tasks are in 
the ready state, the processor is not working and is in 
a wait state. When any task reaches a point where it 
no longer can process until the completion of some 
event (such as an I/O operation), it relinquishes con- 
trol of computer facilities to lower priority tasks via 
the scheduler. It will regain these facilities when the 
event it is awaiting is completed and there are no 
higher priority tasks which are in the ready state. 

Inter partition communication 

The subject real-time system requires that oper- 
ational tasks be able to communicate for the purpose 
of exchanging information such as live data, requests 
to run various subtask routines, etc. Tasks which 
communicate with other tasks are equipped with inter- 
task communication routines which are considered the 
highest priority routines within the individual task. In 
this fashion, when the task is dispatched, the internal 
task priority scheme allows the communication routines 
to be processed first. Furthermore, any task can be 
interrupted to allow its communication routines to 
operate. Thus tasks can communicate at any time 
(asynchronously) . 

Partition sharing 

The Partition Share Supervisor (PSS) is required to 
be able to handle three basic functions : 
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Figure 2 — Showing memory configuration when low 
priority on line (LPOL) task is active 


1. Suspend processing of the off-line task when 
required. 

2. Load and process the lowest priority on-line 
task (LPOL) . 

3. Upon completion of (2) above, be able to restore 
and restart the off-line task. 


partitions and the area dedicated to the PSS and LPOL 
tasks is one contiguous partition. 

Detailed discussion 


There are two conditions under which PSS suspends 
off-line processing. One is when the previously set 
real-time clock causes an interrupt. This interrupt is 
recognized as indicating the LPOL is to be recycled 
for a periodic run. The other is when a communication 
is received from another task indicating that one of the 
routines within the LPOL task is to be executed. 

Figure 1 shows the computer configuration in the 
normal mode. Normal mode is considered to be when 
the shared partition is occupied by off-line programs. 
Note that there are four problem program partitions 
(excluding the nucleus) . 

Figure 2 shows the configuration when the off-line 
programs are “rolled out” and the LPOL programs 
are operational. There are now three problem program 


The following description details the operations in- 
volved in reconfigurating the system from that of 
Figure 1 to that of Figure 2 and returning to that of 
Figure 1. 

As previously stated, the PSS task is initiated for 
one of two reasons : 

1. Timer interrupt indicating a need to run the 
LPOL task for time dependent programs. 

2. External interrupt triggered by communication 
from another task indicating a need to process 
a requested program. 

Prior to either type of interrupt, the PSS task is 
in a wait state (i.e., the task cannot be dispatched 
until the completion of one of the above two events). 
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Upon being initiated, PSS takes the following steps: 

1. Places its own task in the supervisor state in 
order to allow execution of privileged instructions 
required to modify system control blocks in the 
nucleus, override the storage protection feature, 
and disable system interrupts at critical times. 

2. Allows all outstanding I/O to complete in the 
off-line partition (quiescing the partition). 

3. Erases the boundary between the PSS task 
and off-line task. 

4. Deletes reference to the now non-existent off- 
line task from operating system control blocks. 

5. Writes a copy of the off-line partition, which is 
now an extension of the memory area of the 
PSS task, on a disc file. 

6 . Reads the LPOL task into the vacated area . 

7. Executes the LPOL task. 

At this point, we have gone from the configuration 
shown in Figure 1 to that of Figure 2 and the LPOL 
task is now able to process its requests. Upon com- 
pletion by the LPOL task of all required processing, 
the following steps are taken by PSS to return to the 
off -line configuration : 

8. Writes the LPOL task on a disc file. 

9. Reads the off-line task into the vacated area. 

10. Re-establishes task boundaries erased in 3. 

11. Restores system reference to the off-line task. 

12. Places the PSS task in a “wait state” awaiting 
an interrupt which will cause a recycle. 

At this point, the off-line task is fully restored to the 
system and in a “ready state”. It will then be redis- 
patched by the task dispatching routines on a priority 
basis. 

System control blocks 

Prior to a detailed discussion of PSS mechanics, we 
will discuss relevant system control blocks utilized in 
effecting partition sharing. 

Task Control Block (TCB) 

There is a TCB associated with each task. Contained 
in the TCB are various boundaries, indicators, etc., 
used in performing task control. Figure 3 shows those 
fields (with references labeled as used in this paper) 
which are accessed or modified by PSS. 

TCB List (TCBLIST ) 

The TCBLIST is located in the nucleus and is a 
list of TCB locations in order of task priority. There 



FIELD 

COMMENTS 

TCBTABB 

POINTER TO TASK 

MSS (BOUNDARY 

BOX— SEE Flfi. 5) 

TCBPKE 

CONTAINS STORAGE 
PROTECTION KEY 

FOR THE TASK 

TCBIDF 

TASK IDENTIFICATION 
NUMBER 

TCBTCB 

POINTER TO NEXT 
LOWER PRIORITY 

TASK TOB 


Figure 3— Task control block (TCB) 


is an entry in the list for each task in the sj^stem (see 
Figure 4) . 

Task Area Boundary Block (TABB) 

There is a TABB associated with each task. The 
TABB contains addresses defining the upper and lower 
boundaries of the task region and also has a pointer 
to the first free area label within the task. The format 
of a TABB is shown in Figure 5 . 

Free Area Label (FAL) 

There is an FAL which is an integral part of every 
available free storage area in memory. An FAL is 


POINTER TO TCB OF HIGHEST PRIORITY TASK 


POINTER TO TCB OF NEXT HIGHEST PRIORITY TASK 


POINTER TO TCB OF LOWEST PRIORITY TASK 


Figure 4— TCB list (TCBLIST) 
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LABLE 

FALPT 

COMMENT 

POINTER TO FIRST FREE AREA 
LABEL (FAL) WITHIN TASK 
AREA. (SEE FIGURE 6) 

FALPT 


LOADDR 

LOADDR 

THE ADDRESS OF THE LOW 
BOUNDARY OF THE TASK 

HIADDR 



HIADDR 

THE ADDRESS OF THE HIGH 
BOUNDARY OF THE TASK. 


Figure 5 — Task area boundary block (TABB) 


effectively a label for each free storage area which 
defines the size of it and contains a linkage pointer to 
the next FAL. The format of an FAL is shown in 
Figure 6. 


Input/Output Request Element (IORE) 

There is a chain of IOREs for all outstanding or 
queued I/O operation requests from any partition. 
Each IORE contains information used by the system 
I/O interrupt handling routines as I/O operations are 
completed. Figure 7 shows the format of an IORE. 


System Vector Table (SVT) 

The SVT is resident in the nucleus and contains 
essential pointers required by the operating system. 
Included is a pointer to the start of the IORE chain. 
The location of the SVT is retrieved from a fixed memo- 
ry location which is conditioned with the SVT address 
during system initialization. 

As mentioned under General Discussion, the PSS 
task is required to run in supervisor state at times. 
Although the state of the PSS task changes from 
problem to supervisor and back throughout its exe- 
cution, these changes of state will not be noted in 
this discussion. It should be understood that PSS 
operates in problem state at all times where it is not 
required to be executing privileged instructions, modi- 
fying storage in another partition or the nucleus, or 
disabling I/O interrupts. 


FALNXT 
FAL COUNT 


FALNXT 


POINTER TO NEXT FAL IN THE 
CHAIN OF FAL'S.. 

NOTE: IF THIS FIELD IS ALL 
ZEROS. THIS IS THE LAST FAL 
IN THE CHAIN. 


FALCOUNT 


AMOUNT OF FREE MEMORY 
AVAILABLE STARTING AT THE 
BEGINNING OF THIS FAL. 


Figure 6— Free area label (FAL) 


STATUS INDICATOR FOR THIS 
IORE. THE LAST IORE IN THE 
CHAIN HAS AN IORE STAT FIELD 
VITH A VALUE OF 1. 


FIELD SET TO SAME ID NUMBER 
AS THAT OF THE TCBIDF FIELD 
OF THE TASK IHICH INITIATED 
I/O REQUEST (SEE FIGURE 3) 

Figure 7 — 1/0 request element (IORE) 

Quiescing a partition 

Prior to rolling out the off-line partition, PSS must 
be sure all I/O is quiesced in order to prevent the I/O 
supervisor routines from accessing some storage area 
which is in a transitory state. 

There is an IORE for all outstanding and queued 
I/O requests. Within each IORE is an identification 
number field (IOREID — see Figure 7) which links it 
with the initiating task. When that task is involved in 
an I/O operation, the TCBIDF field of the TCB 
(Figure 3) has a task identification number that will 
match the IOREID field of some active IORE. 

As I/O interruptions occur, the I/O Interrupt Han- 
dier services the interrupt and removes the appropriate 
IORE from the chain and makes it inactive. 

Partition quiescing is accomplished by initially dis- 
abling I/O interrupts, obtaining the TCBIDF field 
from the TCB of the task involved, locating the IORE 
chain by using the pointer in the SVT, and scanning 
the IOREs checking for IOREID fields which match 
the TCBIDF field of the TCB. If none are found, there 
are no IOREs for the task and it is already in a quiescent 
state. If any are found, then the task has a pending 
I/O interrupt or outstanding I/O requests. If this is 
the case, PSS enables interrupts allowing the I/O 
Supervisor to process, if necessary, and then immediate- 
ly disables them. If the I/O in question has been com- 
pleted, the IORE will have been removed from the 
chain during the time interrupts were enabled. 

PSS restarts at the beginning of the chain and checks 
again, repeating the above steps until it comes to the 
end of the chain without having found any active 
elements for the task. When it reaches this point, there 
are no longer any IOREs associated with the task and 
it is in fact quiescent. 

It should be noted that since the PSS task has a 
higher priority than the task to be quiesced, it does 
not allow any new I/O requests to be initiated by that 
task since PSS retains the computer resources. 

Erasing of a partition boundary and 
task deletion 

There is control information which is received by 
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the communications routines within the PSS task 
which must be accessible to the: LPOL task for both 
reading and writing (such as indications which LPOL 
routine to is be run, the replacement value for the 
next cycle time which is calculated by the LPOL task 
as a function of its current running time, entry point 
addresses of routines mutually shared by the PSS and 
LPOL tasks, etc.). Additionally] task management is 
greatly facilitated by extending the PSS task area to 
include the LPOL function while controlling via the 
PSS Task Control Block (TCB) rather than modifying 
the off-line task TCB or creating a new one. 

In order to make the shared . task area a memory 
extension of the PSS task, the memory areas must be 
linked. This is achieved by modifying the TABB (see 
Figure 5) of the PSS task so that the LOADDR field 
points to the low address of the shared task. Figures 
8 and 8a show the pointer relationships before and 
after these TABB modifications. 

The storage protection feature must now be satisfied 
to make the two storage areas completely contiguous. 
Since there is a mismatch in storage keys between the 
PSS and shared tasks, the keys associated with each 
protected block of memory within the shared task are 
reset to match those of the PSS task. At this point, 




the two task areas have become a contiguous block of 
memory assigned to the PSS task area. 

Figure 9 shows how TCBs are linked together within 
the system. Note that each entry in the TCBLIST 
points to a TCB and each TCB points to the next 
lowest priority TCB in the chain. Figure 9a shows the 
arrangement of the TCBLIST and the TCBTCB field 
in the next-to-last TCB in the chain after modification 
to three partitions. This has been done by replacing 
the pointer to the last TCB in the TCBLIST with a 
pointer to the next-to-last TCB, and setting TCBTCB 
field of the next-to-last TCB to zero. These modifi- 



Figure 8 — TABB pointers in PSS and offline task 
prior to modification 


Figure 9 — Portion of nucleus showing TCBLIST and 
TCBTCB pointer relationship prior to modification 
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Figure 9a— TCBLIST and TCBTCB pointers after 
modification 


cations have additionally made the last task non- 
existent to the operating system. 

Rollout/Rollin 

The process of rolling out the off-line task and rolling 
in the LPOL task is a straightforward write/read 
operation to a disc file. Since storage is divided into 
2048 byte units for assignment of storage keys, the 
task area read or written is some multiple of 2048 
bytes in length. Thus the records are read or written 
in 2048 byte blocks for purposes of simplicity and 
efficiency. 

Free area modification 

The PSS and LPOL tasks now occupy the same task 
area. It is necessary, therefore, to make certain modifi- 
cations which will cause all requests for work storage 
to be satisfied from that portion of the task area wholly 
dedicated to the LPOL task. Although no task bound- 
ary exists between LPOL and PSS, if work storage 
were to be allocated from the PSS domain, it would 
not be subsequently saved and restored in future 
cycles since the PSS area is not included in the dynamic 
area which is stored on the disc file. 

Figures 10 and 10a show how these modifications 
are accomplished. Initially (Figure 10) the FALPT field 
of the PSS TABB is pointing to the free area within 
what was its own task area. This is the normal condition 
for this pointer when there is an operating off-line 
task. However, we have modified the configuration to 
three task areas and we now wish to make the only 
available free area all exist in the LPOL area. Figure 
10 A shows that the FALPT field of the PSS TABB 
has been re-pointed to the first FAL within the LPOL 
task area. 

At this point, the LPOL task is ready to process 



Figure 10— FALPT relationship with FAL locations 
prior to modification 



Figure 10a — FALPT fields after modification 
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whatever request caused it to be activated. We have 
now covered steps 1 through 7 under General Dis- 
cussion. In returning from the three partition to the 
four partition environment, the steps are essentially 
the reverse of those detailed. 

Upon restoring the off-line task, PSS enters a wait 
state and will be restarted as previously outlined. The 
task dispatcher portion of O/S will restart the off-line 
task as soon as there is available computer time and 
no higher priority tasks require the computer resources. 


Initialization 


The initialization process for PSS consists of : 


1 . 

2 . 

3 . 

4. 


6 . 

7. 

8 . 


Suspending of off-line processing. 
Reconfiguration from foulr to three partitions. 
Rolling out the off-line task. 

Making the off-line task area one contiguous 
free area. 


Loading the LPOL task and allowing it to 
initialize itself. 

Rolling out the LPOL task. 

Rolling in and restarting the off-line task. 
Entering the normal cycle at the wait point. 


Step 4 above has not been previously covered in 
detail. In order to force the initial loading of LPOL into 
the desired location, the FALs for PSS are initially 
modified. Figures 10 and 10A show the PSS TABB 
before and after this is done. The FALPT field of the 
PSS TABB initially points to the first FAL within 
the PSS area. The FALPT field of the LPOL TABB 
points to the first FAL of its task area. By altering 
the FALPT of the PSS TABB to make it point to the 
LPOL first FAL and by altering the FAL by both 
making it the last FAL in the chain and indicating 
one large block of free memory, we have created a 
large free area available to PSS for loading the LPOL 
programs. 

As the LPOL task acquires and releases memory 
blocks for work storage, the FALs within the area 
are modified by the operating system consistent with 
memory availability. PSS simply saves the pointer to 
the first LPOL FAL prior to each rollout and restores it 
after rollin and prior to reinitiating LPOL. Continuity 
of FAL linking is maintained in this fashion. 


Special handling 

There are occasions when the off-line partition can- 
not be quiesced. This could be caused by a card reader 
jam, a printer being out of paper, etc., causing an 
IORE associated with the I/O to remain linked in the 


chain beyond some reasonable amount of time (pres- 
ently 10 seconds). These conditions are relatively 
infrequent; however, provision has been made for them 
by advising the operator via the computer console 
typewriter and an attention bell that the off-line task 
is non-quiescent and requires attention. 

The memory area actually required by PSS is less 
than 6K. However, in order to initially load PSS into 
memory, a large enough partition must be available to 
furnish the operating system job scheduler routines 
their required amount of core. This requirement is in 
the order of 24K. Thus there is a pre-initialization 
phase during which PSS changes the initial configur- 
ation (Figure 11) of 50K and 52K to 6K and 96Kfor 
the PSS and off-line tasks, respectively (Figure 1). 
The technique for doing this will not be detailed; how- 
ever, the essential steps are as follows : 

1. Referring to Figure 12, the initial PSS task area 
is shown in three segments (B, C, D) and the 
initial off-line task area is shown in one segment 
(A). The PSS Pre-Initializer is loaded by the 
operating system into area B . 


72K 


4QK 


5GK 


52K 


42K 


Figure 11 — Initial task core allocations 
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2. In order to place the PSS main program in the 
area where it can control storage, it must be 
forced into area D. To achieve this, the task 
area boundary block is modified to make area 
D free and areas B and C unavailable. 

3. The PSS main program is loaded into area D. 

4. The off-line boundary block is modified to in- 
clude areas B and C as free areas. 

5 . Control is passed to PSS main . 

The configuration is now that of Figure 1 . 
CONCLUSION 

Implementation of PSS has effectively added 96K of 
additional processor memory to the real-time system 
of which it is an integral part. This coupled with the 
facility to process off-line tasks while having an availa- 
ble stand-by on-line task, has greatly enhanced the 
capability of the system. The application of PSS has 
effected a maximal utilization of computer resources 
by the system. 
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INTRODUCTION 

Large-scale integration for computer applications 
has been predicted for several years, but close examina- 
tion shows that the progress has been uneven. Memory 
designers continually demand higher levels of inte- 
gration for larger and faster memory systems, and 
new memory concepts are being developed to further 
exploit the characteristics of large-scale integration. 
The one-thousand-circuit chip will become nothing 
more than a milestone. 

But what of the logic area? Here, we struggle along 
hoping to find some high-volume applications for chips 
with a mere fifty circuits. When we design a medium- 
sized machine we find that so much unit logic is re- 
quired that the average level of integration falls below 
ten. Orderly memory and random logic integrated 
circuit fabrication procedures are growing so different 
that thought is being given to building different types 
of manufacturing facilities. This represents a rather 
drastic approach and in the authors' opinions may 
prove unnecessary. 

The success to date in memory is encouraging, for 
it gives direction to logic. Memory products should 
therefore be examined critically for they may well 
hold the key to success for logic products. The salient 
features of a chip used in a memory product are : 

• Regularity. Memory arrays are regular in com- 
ponents and wiring. The layout geometry is well 
defined and can be highly optimized for total 
chip utilization. 

• Low Power. Memory systems are designed and 
partitioned so that all circuits on a chip do not 


dissipate maximum power at the same time. 

• Well-Defined Function. The memory chip de- 
signer knows exactly how his chip fits into the 
entire memory system. He therefore can opti- 
mize on a high level. As examples, he uses special 
circuits for the latch functions and uses de- 
coders redundantly to save pads. 

• Volume. # While the initial memory chip design 
is quite complex, the volume requirement makes 
the initial design cost nearly negligible. With 
this ground rule the chip can be highly engineered, 
and nearly order of magnitude improvement 
can be expected and obtained. 

Structured logic, or array logic as it is sometimes 
called, is an attempt to design logic with more of the 
characteristics of memory. Many unsuccessful starts 
have taken place, but we shall discuss some of the 
more successful efforts. We shall also add some thoughts 
of our own, but it should be pointed out that the prob- 
lem is far from solved. 

Logic arrays 

The basis of all array logic is a matrix of elements 
with programmable interconnections. Diode structures 
have been proposed in the past, and a matrix of com- 
mon collector transistors is of recent interest. The 
transistor array is programmed in the factory by 
connecting or not connecting the emitter of each 
transistor to a common line. (See Figure 1.) We shall 
use transistor arrays in our examples, for that is what 
we have been working with, but diode arrays should 
not be ruled out. 
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Figure 1 — A transistor array 


The ROS | 

The read-only store (ROS) array in its simplest 
form uses two decoders to feed the array: one feeds 
the horizontal lines and the othpr the vertical lines, 
as shown in Figure 2. A particular grid position in the 
array is selected by activating the appropriate hori- 
zontal and vertical decoder linesj. The addressed cell 
of the array is located at the intersection of the two 
activated lines. If the emitter at! this address is con- 


ROS 

C D 



nected to the horizontal decoder line, then a 1 has 
been programmed into this particular cell in the array. 
If the emitter is unconnected, a 0 is said to be pro- 
grammed into the array. The presence of the pro- 
grammed 1 or 0 is sensed at the output when that 
particular cell is addressed. The horizontal output lines 
are dot ORed together to produce one common output 
line, as shown in Figure 3. 

Conceptually, the ROS is related directly to a 
Karnaugh map, one bit position in the array for each 
square in the appropriate Karnaugh map. Figure 4 
depicts the four-variable K-map that relates to the 
ROS of Figure 2. This relationship proves the uni- 
versality of a ROS, for any Boolean function that 
can be K-mapped can be implemented directly. Uni- 
versality is the feature of the ROS chip most often 
described as an asset, but in practice it is seldom use- 
ful except in code translators. The Boolean functions 
used in the design of any computer are definitely not 
random and not evenly distributed among all pos- 
sible functions of n variables. This fact is well docu- 
mented in the many failures with other universal 
logic blocks (ULB’s). The real problem with the ROS 
array is that it doubles in size each time an input 
variable is added. This doubling in size is necessary 
to maintain the dubious value of being universal. 

The ROAM 

The read-only associative memory (ROAM) is a 


ROS CIRCUITS 



Figure 2 — Read-only store 


Figure 3 — Read-only-store circuits 
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Figure 4 — Karnaugh map 


matrix of common collector transistors that may be 
programmed by connecting or not connecting the base 
of each transistor to a common line in its own column 
(Figure 5.) The emitters of each row are commoned 
and feed the emitter of an output transistor. Each 
row of array transistors and the associated output 
transistor form a current switch. 

Through phase splitters, each input variable has 
both true and complement lines available to the array. 
Hence, each variable controls a true line and a com- 
plement line (column) in the array. This gives rise 


ROAM 

ABC 



Figure 5 — Read-only associative memory 


to the word “associative” in the name. By program- 
ming each row in the array to a particular pattern 
of l’s and 0’s, the input word pattern will “associate” 
(compare) with the appropiate row in the array. If 
there is no match, the outputs will remain logical zeros. 
If at least one row has a pattern the same as the input 
pattern, there will be a logical one output on that 
horizontal line (row) . 

To program the array, each base is tied to a true 
line (column), a complement line (column), or is 
left floating. Thus, for a base tied to a true line, a 1 
on that input line will yield a 1 at the emitter and a 
1 at the output, since the row of emitters effectively 
forms a DOT-OR (positive logic). Bases tied to a true 
line are equivalent to a logical 1, since a 1 at that in- 
put causes a 1 at the output. 

Conversely, a base tied to a complement line is 
equivalent to a logical 0. A 0 at a particular input 
raises the complement line of the phase splitter, 
thereby raising to the 1 level all emitters of transistors 
in that column that have their bases tied to the com- 
plement line (column) . 

If the base is left floating, that array grid position 
is effectively a DON’T CARE. That is, the output 
line will not be raised to 1 by either a 1 or 0 at that 
transistor’s column input. 

Figure 6 illustrates the implementation of an adder 
position with SUM and CARRY outputs using a 
ROAM array. A black triangle connecting a vertical 
line and a horizontal line indicates a base connection; 
lack of a black triangle indicates a floating base. Note 
that if a true line is connected, then the complement 
line is not connected, and vice versa for each array 
grid position. Thus, at most, only 50 percent of the hori- 
zontal and vertical intersections will ever be used. 

To conceptually understand the ROAM and relate 
it to the Karnaugh Map it is convenient to think in 
terms of negative logic. Thus, down levels are logical 
1, the commoned emitters of each row form a DOT- 
AND (all emitters down results in a down level, any 
emitter up results in an up level), and dotting the output 
transistors results in a DOT-OR. 

Each row of the ROAM represents a term of a 
logical expression in the sum-of -products form. The 
logical expression CARRY = B- C + A- B + A- C 
is in sum-of -products form, and B • C, A • B, and 
A • C are each terms of the expression. Each term 
may be implemented on one row of the ROAM. For 
example, Figure 6 illustrates the implementation of 
the CARRY function. Note that the A true and B 
true columns are both connected to a transistor base 
in the second row of the ROAM array, yielding the 
term A • B. The three rows B • C, A • B, and A • C 






Implicants (Rows) 
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are DOT-ORed at the output to yield B • C + A • 
B + A ■ C = CARRY. In forming the term A • B, 
the variable C does not have its true or complement 
column line connected to a base. CARRY is 1 if A is 
1 and B is 1 regardless of the value of C. 

Each term of a logical expression in sum-of -products 
form is an “implicant” on a Karnaugh Map. An im- 
plicant is formed by looping the l’s in the Karnaugh 
map and “reading” the loops from the map. Loops 
can only contain adjacent l’s and the number of ones 
in a loop must be equal to 1, 2, 4,..., a power of 2. 
This results from the fact that adjacent squares on a 
Karnaugh map always differ only by the value of 
one variable. Two squares looped yields a term with 
n-1 variables (n = number of variables), four squares 
looped yields a term with n-2 variables, etc. Thus, 
each implicant requires one row in a ROAM. The 
bigger the loop of l’s the fewer connections need be 
made in that row. The complete expression is formed 
by DOT-ORing the rows which is the same as ORing 
the implicants. 

The example of Figure 6 uses three loops of two 
Us each to form the CARRY. The SUM is formed 
by four loops of one 1 each. In this case three con- 


TABLE I— Bits required for n variables in ROS and ROAM ARRAYS 

VARIABLES 

2 3 4 5 6 7 8 

BITS 


ROS 

Always Universal 

4 

8 

16 

32 

64 

128 

256 

2 n 

ROAM 









2 

8 

12 

16 

20 

24 

28 

32 

4-n 

3 


18 

24 

30 

36 

42 

48 

6*n 

4 


24 

32 

40 

48 

56 

64 

8-n 

5 



40 

50 

60 

70 

80 

10' n 

6 



48 

60 

72 

84 

96 

12' n 

7 



56 

70 

84 

98 

112 

14 n 

8 



64 

80 

96 

112 

128 

16- n 

9 




90 

108 

126 

144 

18* n 

16 




160 

192 

224 

256 

32 n 

2 n /2 Rows (Universal) 

8 

24 

64 

160 

384 

.896 

2048 

n-2" 
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ROAM bits = 2 I n , 

where n = number of variables, I = number of im- 
plicants. Thus, it is more economical to build a function 
with the ROAM when 21 n < 2 n . This does not 
consider the cost of the ROS decoders, which add a 
factor to the inequality. 

If we assume that the decoders for n-even take 
2n(2 n t 2 ) bits, and for n-odd take [{n + 1) 2< n + 1 >/ 2 + 
(n — 1) 2< n - 1 >/ 2 ] bits, then the cases for which ROAM 
should be used are: 

1. n even 

2 1 n < 2 n + 2 n(2 n / 2 ); 

2. n odd 

2 In < 2 n + (n + 1) (2 [ "+ l] / 2 ) + (n — 1) (2 [ ”- l] / 2 ) 

Thus, ROAM is more economical than ROS in most 
practical problems. 

A realistic example of control logic for a small ma- 
chine model has been implemented using the ROAM 
array. Table II gives a comparison of the number of 
bits required for a ROAM implementation versus the 
number of bits required for a ROS implementation. 
Note that the ROAM is significantly more economical. 

A partitioning of functions could have been devised 
for the ROS implementation. The ROAM would still 

TABLE II — ROS vs. ROAM— a control logic example 


TOTAL NUMBER OF VARIABLES 14 

TOTAL NUMBER OF FUNCTIONS 6 

TOTAL NUMBER OF IMPLICANTS 12 

One 7-implicant function of 13 variables 
Four 1-implicant functions of 7 variables 
One 1-implicant function of 11 variables 

ROAM 

ARRAY SIZE: 2& X 12 336 BITS 

ROS 1 

ARRAY SIZE /FUNCTION : 2 14 16,384 BITS 

6 ARRAYS FOR 6 FUNCTIONS: 6 X 16,384 98,304 BITS 

SHARED DECODER 3,584 BITS 

TOTAL BITS 101,888 

ROS 2 

ARRAY SIZE FOR 13 VARIABLES: 2 13 8,192 BITS 

ARRAY SIZE FOR 7 VARIABLES: 2^X4 512 BITS 

ARRAY SIZE FOR 11 VARIABLES: 2 11 2,048 BITS 

SHARED DECODER 3,584 BITS 

TOTAL BITS 14,336 


nections must be made in each of the four required rows 
to obtain 

SUM = A B-C + A B C + A.B.C 

+ A - B • C 

In contrast to the ROS, the ROAM can have uni- 
versal capability with only one -half the number of 
rows as the ROS needs bits for the same number of 
variables. Moreover, the ROAM does not need to be 
universal to be useful, thus allowing even further 
reduction in size. Table I illustrates the difference 
brought about by the ROS requiring one bit per K-map 
position and the ROAM requiring one row per K-map 
implicant. 

Historically, computer functions are composed of 
about four implicants or terms. The chart shows that 
a four-implicant function is cheaper to implement 
with a ROAM than with a ROS when the function 
contains six variables or more. When the decoders 
required for the ROS are considered, even four-vari- 
able functions with four implicants are more econom- 
ical in ROAM than in ROS. 

Two useful formulas to compare ROS bits required 
with ROAM bits required for a given function are: 

ROS bits = 2 n 
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be more economical than the RQS, however, especially 
when one considers the additional wiring complication 
of connecting several small ROS arrays and the ad- 
ditional design time required to effectively partition 
the functions. 

The optimum size for a ROAM has not been de- 
termined, but chips with at least 512 bits on them are 
desirable. This capacity would provide between eight 
8-variable, 4-implicant functions, and one 64-variable, 
4-implicant function (an extreme case, needless to say) 
on a chip. The practicality of building and using such 
a chip is yet to be determined. 

The SLT array 

Arrays can be designed so that they may be used for 
direct replacement of present logic. The SLT array 
performs the function AND-OR-INVERT in nega- 
tive logic or OR-AND-INVERT in positive logic 
and can be used directly to replace SLT logic. While 
direct replacement of random logic with array chips 
may prove to be the wrong approach in the long run, 
it may well be the only way to get array logic started. 

The SLT array has the same advantages over or- 
dinary logic that all arrays have: orderliness of design 
and layout, and high density with relatively low cost. 



Figure 7 — SLT array 


In addition, this type of array has a higher bit usage 
than other arrays, since it more closely resembles the 
familiar random logic, functionally. The SLT array 
does not have decoders or phase splitters on its input 
lines, as do other types of arrays. This makes the array 
less universal than even the ROAM array but more 
effective for random logic. It is fair to say that arrays 
of this type make poor code translators just as SLT 
logic builds poor translators. It is difficult to believe 
that any array will be effective in both random logic 
and code translation problems. 

As already stated, the ROAM array has specific 
applications to decoders and associative memory 
problems. The SLT array may very well be the ele- 
ment required to do general logic design. The reason 
for this is the placement of the inverters as shown in 
Figure 7. This movement of the inverters to the out- 
put lines may appear a minor modification, but it 
should be remembered that there has neveir been a 
useful logic block with inverters on the input lines. It 
may pay to have both true and complemented out- 
puts from a current switch logic block. Figure 8 shows 
a full adder implementation in SLT logic and in an 
SLT array. 

Array-driving arrays 

The SLT array in Figure 8 demonstrates one neces- 
sary feature of an array that has yet to be discussed: 
Any logic array must be able to drive any other array 
in the same family, including itself. Note in Figure 
8 the CARRY output fed back into the array. This 
line probably will be an external wire. This technique 
is required since it is in effect Boolean factoring, a 
proven necessity. This type of feedback is also needed 
to produce sequential circuits, giving memory to the 
arrays. 

Figure of merit 

It is less meaningful to compare array logic with 
random logic in each individual term of power con- 
sumption, propagation delay time, and silicon area, 
since one can usually be traded for the other, such as 
power with delay. Instead a comparison is made of 
their figures of merit, chosen to be the product of 
power consumption P, delay time T, and silicon area 
A, all with weight function of one (PTA). Since no 
isolation wall is needed between collector transistors, 
a ROS or ROAM cell including appropriate inter- 
connections can be laid out on a silicon chip area equi va- 
lent to 20-25 percent of that occupied by a transistor 
that needs isolation walls. As shown in Figures 5 and 7, 
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CARRY 


ABC 



the delay time of an array is two levels of current 
switch emitter follower (CSEF) independent of the 
number of inputs. For sophisticated functions, such 
as the one-bit adder shown in Figure 8, more than two 
levels of logic may be required. 

Some typical comparisons of array logic and random 
logic include the sampling design of array logic chips 
to perform the same function a random logic chip 
would. This comparison helps to partially discover 
the merit and the limitation of the array logic. In 
comparison with random logic chips that perform 
sophisticated functions or have two or more cascading 
levels of CSEF’s, array logic chips have superior 
PTA figures. 

CONCLUSIONS 

Various array configurations described here suggest 
that random logic may be implemented by use of an 
array of programmable crosspoints. Comparisons of 
array logic with conventional logic indicate that in 
many cases the PTA figure of merit is superior for 
arrays. The most significant problem with arrays ap- 


pears to be the limited useful size of a single array, 
and the difficulty in standardizing a particular array 
configuration. As a minimum achievement at this 
time, it appears that arrays will be useful in develop- 
ment of complex functions within a silicon chip. 

Array logic will not eliminate the need for a circuit 
designer in the future, since specialized designs will 
be needed to optimize circuit and component technol- 
ogy. In some of these design cases, the importance of 
array logic techniques will be obvious, but in others 
it will not be. 

At this point, array logic does not appear to strongly 
affect the system designer’s approach to machine de- 
sign, and a knowledge of array logic may never be re- 
quired. 

In the future, however, to the extent that array 
logic techniques influence the design and optimization 
of highly efficient functions, the system designer’s 
work will be significantly influenced by progress made 
in developing array logic techniques. 
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BACKGROUND 

Since the advent of LSI technology, several schemes 
have evolved for the utilization of large arrays to their 
full potential. A common and straightforward approach 
involves the designer restricting himself to the equip- 
ment being designed at the moment. Faced with only 
a limited set of problems, it is not difficult to specify 
a small number of LSI array types which will efficiently 
complete the design. While the results are quite en- 
couraging for specific cases, 1 the drawbacks of any mass 
adoption of these techniques are obvious. This, the 
so-called “custom approach,” would require the semi- 
conductor manufacturer to be responsive to each cus- 
tomer with numerous low-output production runs of 
highly specialized devices. The per-unit cost to the 
user, for his own efforts as well as those of the manu- 
facturer, would be quite high due to the inability to 
spread initial costs over many devices. In addition, 
the complexity of 100-gate-plus arrays is such that it 
is difficult to substitute one for another (with efficient 
results). This would severely limit the • off-the-shelf 
capabilities of both user and manufacturer. 

An obvious solution to these problems is the intro- 
duction of a small set of standard LSI chips. Semi- 
conductor suppliers, making tentative advances into 
LSI product marketing, have already proposed such 
devices as adders, counters, and shift registers. How- 
ever, this does not represent the solution to the general 
problem. A design heavily committed to the use of these 
devices must fall back on MSI or standard IC for the 
large remainder of the circuitry. The reason is that 
adders, counters, registers and other orderly, well- 


defined areas represent the regions of the system with 
the highest gate-to-pin ratios. After these portions are 
lifted out of the system, the remainder is characterized 
by very low gate-to-pin ratios (notably control and 
data routing functions). Unable to satisfy the LSI 
design criteria of high gate-to-pin ratios any longer, 
the designer must look to more standard components. 
Unfortunately, any proposed solution to the LSI 
partitioning problem which lacks a total system ap- 
proach tends to drift towards this pitfall. 

Researchers striving towards partitioning for total 
or near-total LSI implementation tend to diverge 
along one of two conceptual paths; bit-slicing and 
functional partitioning. To illustrate the difference, 
consider the data portion of the computer. In functional 
partitioning one may specify an adder as one LSI ar- 
ray, registers as another, a shift register as a third, and 
so forth. On the other hand, in bit-slicing one would 
design an LSI array consisting of a combined one- or 
two-bit adder, registers, shift registers, etc., then build 
up his system from this chip type according to the de- 
sired word length. 

The bit-slice approach has resulted in some notable 
advantages, particularly the ability to achieve very 
high gate-to-pin ratios and implement systems using 
a small number of different array types. 1,2 However, 
bit-sliced modules have the basic flaw of being system- 
dependent, a drawback described by Pariser in an 
early paper. 3 This means that behind such bit-slicing 
approaches there lie systems, real or implied, for which 
the resulting arrays are most efficient. An attempt to 
apply the arrays to a significantly different system 
results in a poor design. Considering the types of bit- 
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slice devices being proposed, inefficiencies would most 
often be manifest in the design of a simple device in 
which the majority of the gates of the array intended 
to accomplish complex functions &re wasted. Although 
this may be acceptable in some j situations, it is un- 
likely that it would satisfy the strict requirements of 
size, weight, power, and reliability imposed by aero- 
space and military systems. 

It is the contention of this pajper that a judicious 
partitioning of digital systems in general, divorced 
from bias towards any particular system, results in a 
set of LSI devices that can entirely implement many 
different computer systems of varying functional com- 
plexities and word lengths. 

The resulting group of arrays, referred to as a 
“character set” and each one individually as a different 
“character”, is sufficiently small iji number (10), with 
each type having acceptable size and gate/pin ratio, 
to be considered acceptable and desirable in view of its 
wide range of applications. These! building blocks are 
referred to as characters because of the metaphor that 
may be made between the building blocks and char- 
acters of the alphabet (letters). Letters form words 
to express the language whereas building blocks form 
units to build the machine. In both cases a closed set 
(of characters) is used to produce the desired end. 

Although the character set is neither rigidly func- 
tionally-partitioned nor bit-sliced, it is biased towards 
functional partitioning to give it the versatility to 
efficiently implement both complei and simple digital 
devices. As an approach, functional partitioning has 
a detailed and successful background. 3,4 Bit-slicing 
consideratoins give the character set its ability to 
implement systems of varying word lengths. 

In addition to providing the user with a standard 
set of chips to implement many different digital ma- 
chines, the completeness of the approach (the ability 
of the characters to implement the whole machine) 
relieves the user of the burden of logic design. These 
tasks are reduced to the selection of character types 
and word lengths. 

Introduction to the character set 

A universal conclusion among LSI researchers is 
that control functions are more difficult to modularize 
than functions related to data operations. Micro- 
memory control technique was chrisen as the solution 
for LSI implementation for several reasons. A micro- 
memory, meaning here a read-only Isolid-state memory 
with its sequencer and instruction register, is easity 
partitioned into the large modules! necessary for LSI 
implementation. Control functions in this form are 


then amenable to reproduction in large quantities 
of identical units. Also, design with control centered 
in one level of micromemory is more orderly and 
straightforward. 

The micromemory has been provided with a rela- 
tively sophisticated microprogram instruction reper- 
toire. This means that the microprogram contains the 
essence of the machine’s major mathematical func- 
tions, such as multiply and complex sequencing. This 
is desirable since it represents an efficient use of hard- 
ware for these purposes and also reduces the number of 
different array types necessary. Also, a versatile rep- 
ertoire leaves the designer free to make units which 
operate as simply or as complexly as desired. The 
degree of flexibility which this repertoire gives the 
character set is a major factor in its success. It should 
be stressed that the “micro operations” of the charac- 
ter set are as important a factor as its logic design. This 
fact, a critical one in all LSI solutions committed to 
micromemory control, cannot be overemphasized. 

Interest in designing a character set at Hughes was 
concurrent with the development of an advanced com- 
puter system. The character set itself was developed 
with the ultimate objective of implementing all future 
Hughes digital data processing equipment with a com- 
mon family of LSI circuits. 

The outcome of that original effort revealed that 
computer structures in general are frequently ordered, 
or at least amenable to such ordering, as shown in 
Figure 1. 

The divisions of Figure 1 are functional. That is, 
regardless of the hardware characteristics, the computer 
philosophy is such that its functions may be identified, 
separated, and diagrammed as shown in the figure. 

From Figure 1 came the concept of the functional 
character set. With the fundamentals of LSI design 
in mind, logic was designed to accomplish each computer 


FUNCTIONS 


BOOLEAN LOGIC FUNCTIONS 

• MINOR) TRANSFER, SHIFT, 

ROTATE, COMPLEMENT, 
INCREMENT, LOGICAL 
OR, ETC. 


INPUT/OUTPUT FUNCTIONS 


AUXILIARY DEVICES 

• COUNTERS 

• CLOCKS 

• SCRATCHPAD 


CORE MEMORY 


Figure 1 — Computer functional organization 
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Figure 2 — Functional character set 


function indicated by the picture. Each unique LSI 
chip type which resulted was referred to as a different 
character type and given an identifying name and 
number. Figure 2 shows the character set which re- 
sulted from the logic design according to the concepts 
outlined in Figure 1. 

The character set and repertoire have been through 
several improvement cycles and used in the test im- 
plementation of a NASA computer to be discussed 
later. Current plans include test design of the H4400 
(a new Hughes computer) with the improved character 
set, implementation of the character set with high 
speed MOS circuits, and construction of one computer 
using the characters. 

These ten LSI characters alone provide the entire 
hardware complement for the logic of a broad range of 
computers and digital equipment. No extra logic in 
the form of either IC, MSI, or custom LSI need be 
added to the characters to finish the job. An important 
by-product of this is that the user need never consider 
logic design. His tasks are reduced to selection of the 
necessary characters and the writing of the appropriate 
microprograms for them. In fact, it is possible for the 
character set to fit into a realistic total design automa- 
tion procedure as discussed later. 

Description of the character set 

This section describes each of the ten characters. 
They are summarized below for reference. 

G1 Register storage 
LI General logic 
L2 Arithmetic logic 
L3 Input/Output 
M 1 Micromemory counter 
M2 Micro-instruction Register 


MM Micro -array 
P 1 Scratch pad memory 
P2 Up/Down counter 
P3 Switch 

Characters of the same letter are logically grouped 
into a common unit as illustrated in Figure 3. 

G1 character 

The G1 character provides the bulk of storage for 
operands of the microprogram. Each character con- 
tains four registers of eight bits each accompanied by 
reading and writing selector gates. The storage element 
is provided with simultaneous dual reading and 
writing capability. The storage flip flop itself is designed 
for minimum read after write delay. 

Each of the two input busses is common to all 
registers and carries to the G1 character eight lines 
per bus, one line from each bus for each bit of the 
register. Input data selection is accomplished at the 
memory element by a coincidence of positive infor- 
mation on a particular input bus and register selection 
for that bus by destination decoding logic within the 
character. The destination decoding logic is duplicated 
to provide for writing from the two input busses into 
the same character under control of two different micro- 
commands. As will be illustrated later, this is a key 
factor for the machine expandability property of the 
character set as it allows G1 to form a data path link 
between individual logic units under control of up to 
two different micromemories. Different registers in 
the character may be written into simultaneously. 

Reading of the register is provided by dual source 
decoding logic which gates data to independent dual 
output busses. This duality provides for information 
from any two registers to be simultaneously placed on 
two output busses. The conceptual structure of the G1 
character is shown in Figure 4. 

Several G1 characters placed in parallel provide 
registers of more than eight bits in length. 



Figure 3 — Typical functional character configuration 
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LI character 

The LI character provides the basic logic functions 
selectable by microprogram. In addition input bussing 
is provided for nine channels (eight bits/channel). 
One channel of the bus is required for each Gl, L2 or 
or L3 character connected to the LI character. The 
logic functions provided consist of the rotates, shifts 
(logical), no-operation, complement, and incrementa- 
tion. Also associated with the LI character is the de- 
coding logic for these logic operations. The type of 
microprogramming used with the functional character 
system relies heavily upon the fast and efficient manip- 
ulation of bits within the various operands. To this 
end, shifts and rotates have been provided which exe- 
cute from 1 to 31 positions in a single step (as op- 
posed to serial operation). Incrementation is accom- 
plished with the use of a logic register which may also 
be used as a simple holding register. The LI character 
is eight bits wide and contains the following logic: 

1. Bussing gates 

2. Decoding logic 

3. Rotate, shift, and complement logic 

4. Incrementer 

5. L register 

6. Gating to output bus 

In Figure 5 is shown a block diagram of the Ll 
character. Several Ll characters may be connected 
together to form logic operations pn words longer than 




Figure 5 — Ll character block diagram 


one byte. A limit of four bytes exists in order to main- 
tain consistency of definition in the rotates and shifts. 

Information entering the Ll card from the various 
sources is bussed to form the input bus. Them it is 
operated upon and the resultant is bussed to the out- 
put bus where it leaves the character or is optionally 
stored in the L register (where it would thus be available 
at the next micro-instruction time for use in the incre- 
ment operation or as an “L” source). 

L2 character 

The L2 character provides the major arithmetic 
functions used by the microprogram. The arithmetic 
unit provides the 2’s complement sum of the con- 
tents of the A and B registers. Addition is performed 
with carry look-ahead byte parallel. Control signals 
may condition the adder to alternately provide either 
of two special results (a) a mod 2 addition instead 
of full addition or (b) an input carry to the lowest order 
bit for full addition (this forced carry in conjunction 
with a negated operand accomplishes a 2’s com- 
plement operand for subtraction). The L2 character 
consists of two holding registers for the operands of 
the adder, the adder itself, decoding and error logic, 
and bussing gates. Figure 6 diagrams function-wise 
the L2 character. 

A typical arithmetic operation using the L2 charac- 
ter might proceed as follows: (1) first operand trans- 
ferred to B register (from output bus), (2) second 
operand transferred to A register, (3) after appropriate 
delay access result and transfer out of L2 character via 
the input bus. The error logic provides overflow and 
carry-out information. 
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Figure 6 — L2 character block diagram 


L3 character 

The L3 character provides input/output capability 
for the microprogram machine. For purposes here 
input/output includes not only the usual peripherals 
but also main memory, scratch pads, real time clocks, 
all P-characters — namely all elements of the computer 
not directly controlled by the micromemory. The L3 
character provides input gating for external devices— 
four buffered and three non -buffered channels. The 
buffered-input gating may be controlled either by the 
microprogram or the external I/O device itself. Four 
I/O output channels are provided. Interrupt signal 
storage and interrupt mask storage for four channels are 
available. Parity generation and checking along with 
odd/even control is provided for the four buffered, 
channels. L3 also contains the necessary register des- 
tination and selection logic. Figure 7 is a block diagram 
of L3. 

To input data, an input line is selected under micro- 
program control resulting in selected data entering 
an E register or, in the case of a non-buff ered input, 
entering the input bus. To output data, the micro- 
memory places the data in the appropriate E register 
and signals the corresponding I/O unit. The E registers 
themselves are available to the logic unit in a manner 
identical to the G registers (Gl) independent of their 
input/output functions. 

Ml character 

The Ml character provides the micromemory address 
register and related functions. The ten address bits 
of Ml allow for addressing up to 1024 micromemory 
words. The address is contained in the MMC (Micro 



Figure 7 — L3 character block diagram 


Memory Counter) register and serves to address the 
micromemory proper. Associated with the MMC 
register is a five-bit incrementer which automatically 
steps through 32 microprogram address states and 
then repeats addresses. This produces the effect of 
a microprogram ring of 32 words in which the program 
will loop until the microprogram issues an unconditional 
transfer command. There is an S (save) register that 
allows for subroutine jumps. The S register saves the 
content of MMC upon command, keeping it available 
for reinsertion into MMC. Figure 8 shows the block 
diagram for Ml. 

Branching or transferring within the microprogram 
is provided by two modes: unconditional transfer of 
full 10-bit width and conditional transfers of four bit 



Figure 8 — Ml character block diagram 
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width. The Ml character carries the time base whose 
signal is distributed to other characters. 

M2 character 

The M2 character contains a micromemory word 
register. The register is 49 bits long providing for a full 
micromemory word. Forty -nine bits are divided into two 
16 bit fields and a 17 bit field. The first and the second 
fields are instructions and the third is a constant. The 
second instruction is transferred into the register loca- 
tion of the first for execution resulting in sequential exe- 
cution of the two instructions in the micromemory 
word. Timing is derived from the timing base on the 
Ml card. Figure 9 shows the block diagram of M2. 

MM character 

The MM character contains the; micromemory array. 
The address register and word register for the array 
are located on Ml and M2 respectively. MM is a read- 
only array. The presence of an address on the input 
lines causes the contents of the referenced location to 
appear on the output lines after an appropriate delay. 
The MM character consists of 256 words of 49 bits 



Figure 9 — M2 character block diagram 



each. Figure 10 shows the block diagram of MM. 

Several MM characters may be combined to form a 
larger micromemory array. The maximum organization 
is 1024 words by 98 bits. 

PI character 

The PI character is a scratch pad memory of 256 
bits of storage with associated address decode logic, 
address register and data register. The scratch, pad is 
arranged into 16 registers of 16 bits each. Figure 11 
is a block diagram of Pl. 

The PI character is connected to the L3 character 
through which its data flows. Up to 16 PLs may be 
connected in series to produce a total scratch pad of 
256 registers. Generally the bit width will match that 
of the logic unit. 

P2 character 

The P2 character is an expandable eight-bit counter 
with byte look-ahead logic. The introduction of a time 
signal produces a real-time binary clock. The counter 
may be read in parallel and is resettable to any desired 
value. Zero detection is provided which may optionally 
interrupt the microprogram and/or the main program. 
The P2 character is connected to the L3 character 
through which data and control pass. Figure 12 shows 
the block diagram detail. 

The P2 character contains control logic allowing the 
counter to be in a run state or stop state dependent 
upon microprogram control. 

P3 character 

The P3 character provides the capability of switching 
any three input channels to any three output channels. 
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Figure 10 — MM character block diagram 


Figure 11 — PI character block diagram 
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Figure 12 — P2 character block diagram 
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Figure 13 — P3 character block diagram 


Figure 14 — Four stages of expandability 


A 16-bit width is provided. This configuration allows 
three simplex simultaneous connections. Figure 13 
shows the block diagram for the switch. 

The input and output channels of P3 may be con- 
nected to any external interfaces which are electrically 
compatible. Storage is provided on the character for 
nine bits of control information establishing the state 
of the switch. 

There is no restriction on the switch state; all pos- 
sible configurations are allowed (such as three inputs 
to three outputs, one input to three outputs, three 
inputs to one output, etc). 

Hardware applications 

Provided these ten characters and given a design 
performance specification, the decisions the designer 
must make involve considerations of character types 
and selection of word lengths. 

Figure 14 illustrates the levels of machine complexi- 
ties available to the designer. Part A illustrates a very 
basic eight-bit machine, with simple logical, I/O, and 
register capabilities. Part B is the machine of part A 
expanded to 16 bits in its logic and register portions; 
however, no new functional capabilities have been added. 
Functional expansion is demonstrated in part C, where 
an eight-bit adder card and four eight-bit registers are 
added. Part D represents a significantly greater jump. 
Illustrated is the dual-logic unit capability of the 
character set. If desired, it is possible to have two logic 


units, with different but coordinated microprograms, 
operating in parallel. They share the same sequencer 
(Ml), which both control. The G1 bank is common to 
both logic units. 

Part E illustrates an even higher level of expansion. 
Two totally independent micromemory units (memory 
and sequencer) drive three different logic units, linked 
together through G1 cards. This level of complexity 
can be carried to an almost limitless expansion of micro- 
memories and logic units bound together by shared Gl 
characters. A comparison of parts A and E of Figure 
14 illustrates the versatility of the character set as it 
is adapted to both simple and complex situations. 

With the hardware specified, the next major task 
is the writing of microprograms. As stated before, in 
machines of this type this is as important as the hard- 
ware design. Often the only essential difference between 
units designed for different purposes is their micro- 
programs. 

The microprogram repertoire designed for the char- 
acter set is described in the next section. 

Description of microprogram repertoire 

The micromemory word provides the control neces- 
sary for the functions of the characters under its direct 
influence. All these characters so controlled are 'defined 
to belong to a common instruction group. There is one 
and only one M2 character per instruction group. A 
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phase group consists of usually one or two co-instruc- 
tion groups containing a common timing base. There 
is one and only one Ml character per phase group as 
illustrated below. 


M2 

M2 

MM 

MM 


In a phase group containing two instruction groups 
one micromemory word, accessed from the first micro- 
memory array (MM), operates jupon and through its 
logic unit while the other word, accessed fr jm the second 
micro memory array (MM), operates upon a second 
logic unit. Operations are carried out simultaneously 
in each unit with some cross translation. The option 
of including a second micromemory word allows for 
greater system capability by providing simultaneous 
operations; however, this does hot affect the number 
of bits in the data word. (The data width is indepen- 
dently variable by byte.) 

A micromemory word is composed of two 16-bit 
fields and a 17 bit field — two instruction fields and a 
constant field (See Figure 15). The first and second 
instruction fields are identical differing only in that 
execution of the second instruction follows the first by 
1/2 of cycle time (a cycle time is the time required 
for a complete cycle of the micromemory). The in- 
structions can access the constant field, introducing 
into the data stream this constant from the micro- 
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Figure 15 — Micromemory word 
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Figure 16 — Instruction field 


memory. At those times when the constant field is not 
used as such, it takes on additional capability as a 
transfer and machine control field. 

Instruction Fields — Each instruction field is divided 
into three subfields — source, operator, and destination 
subfields as shown in Figure 16. 

The source specifies the origin of the data to be 
operated upon as defined by the operator field. The 
destination specifies the location where the data result 
will be stored after the operation is performed. 

Source Subfield— The source subfield specifies the 
source of information for the micro-instruction. Data 
accessed by the source code appears on the input data 
bus. Typical sources are : 

G1 -G16— The general set of registers 

El -E12 — The I/O registers located on the L3 

characters 

CNT — The constant field from the micro- 

memory word 

INC — The incremented value of the L- 

register 

A —A register located on card L2 

ADD — The sum from the L2 character 

L —The L register of the LI character 

ECS — The error code 

*ADD — The sum from the L2 character of a 

co-instruction group logic unit 

Operator Subfield — The operator subfield specifies 
the type of operation the micro-instruction involves. 
These operators operate upon the data from the input 
bus and present the result at the output bus, Typical 
operators are: 

RSI — 31 — A Right Shi r t from 1 to 31 positions 

LSI — 31 — A Left Shift from 1 to 31 positions 

MSK — The source data masked by the 

constant field of the micromemory 
word 

NJ3P — The no-operation 

R1 -R31 — A left rotate from 1 to 31 positions 

C0M — The ones complement 

Destination Subfield — The Destination Subfield spec- 
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ifi.es directly the register to receive the instruction re- 
sult. These register designations are described below: 

G1 -G16 — The general set of registers 

El -E12 — The twelve I/O registers of the L3 

characters 

B — The B register of the L2 card 

L — The logic register of the LI card 

A — The A register of the L2 card 

*A — The A register of a co-instruction 

group L2 character 

Transfer Field — The transfer field allows for micro- 
program specification of both conditional and uncon- 
ditional transfers within the microprogram. The un- 
conditional transfer provides a ten-bit address, the full 
microprogram addressing capability, while conditional 
transfers provide four-bit addresses. At all times when 
a transfer is not effected (either conditional or uncon- 
ditional) the micromemory counter is incremented by 
one modulo 32. 

There are basically three testable functions. They 
are: (1) least significant bit — true; (2) most significant 
bit — true, and (3) all bits — false (true = 1, false = 0), 

Further, some of these functions may be tested as 
inputs to the logic unit or as outputs and in various 
combinations. 

There exist eleven conditional transfer test combina- 
tions and one unconditional transfer. 

An application of the character set 

In addition to investigation for use with the H4400, 
the Hughes Character Set was used in a test design of 
the NASA Modular Computer Breadboard (MCB). 
The NASA MCB, a prototype of an advanced aerospace 
computer, is a dual-redundant reconfigurable machine 
consisting of five different module types. One each of 
the Control Unit (CU), Arithmetic Unit (AU), Memory 
Unit (MU), and Input-Output Unit (I/O) are required 
for a working computer. The fifth module type, the 
Configuration Assignment Unit (CAU), is not dupli- 
cated. For a detailed description of the NASA MCB, 
see “Implementation of the NASA Modular Computer 
with LSI Functional Characters,” by Pariser and 
Maurer, in these Proceedings. 

Figure 17 shows how the NASA MCB can be im- 
plemented using the Hughes character set. Notice 
that the CU is the only module equipped with the 
double-logic unit feature. 
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Figure 17 — MCB-Modular computer breadboard block 
diagram 


The design of the NASA MCB showed that a fairly 
complex computer could be implemented using only 
the ten characters. Comparison of the gate counts with 
that of a computer built to similar specifications in- 
dicates that design with the character set involves 
approximately 35 percent more gates (exclusive of 
ROM). The comparison machine was composed of 
23 different card types contrasted to the character 
set’s ten. The overall gate-to-pin ratio was 2.6 for the 
character set version and 0.75 for the comparison ma- 
chine. 

Table I is a representative sampling of the estimated 
MCB instruction execution rates. By these estimations, 
the character set version is capable of running as much 
as 55 percent faster than the prototype machine. A 
large part of the speed and versatility of the MCB 
were attributed to the total microprogram approach 
of the character set. Since each unit has its own micro- 
memory control, it was possible to utilize unit overlap 
to the maximum advantage. 

Performance specifications for machines built from 
the character set assume the following about the 
characters themselves. Each character involves approxi- 
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mately 300 gates based on SUHL II type logic. Most 
characters may be sub-partitiohed into two identical 
LSI wafers of 150 gates each. Gate-to-pin ratios 5 vary 
from character to character with an overall average of 
about 2.6. Each level of gating must involve a propa- 
gation delay of no more than 12 nanoseconds to achieve 
the indicated speeds. Read-only-memory access time 
is assumed to be no more than 80 nanoseconds with a 
cycle time of 200 ns. 

Evaluation of the character set 

Design work to date indicates! that most digital data 
processing equipment can be implemented using only 
the ten characters. Gate counts run higher than equip- 
ment configured from discrete iC's, with 140 percent 
of the IC gate count representing an approximate 
upper bound. Speeds appear to be comparable to the 
latest airborne development computers, and promise 
to be competitive with ground equipment as well. 

For all systems where maintainability is a factor, units 
constructed from the character set have the obvious 
advantage that only ten types of spares are needed to 
insure system repairability. Nine of the characters are 
identical in all applications. The tenth, the micro- 
memory, stores a unique program for each application. 
To bypass the requirement for spare ROM's of specific 
patterns, research is currently under way at Hughes 
to develop an electrically alterable ROM. The MM 
characters could be delivered “blank” from the 
manufacturer to be written into by the user with a one- 
shot process. I 

Reliability of character-built LSI computers will be 
enhanced by the reduction in the number of lead-bonds. 
Beyond that, the most significant reliability factor 
probably will be the type of LSI technology chosen. 
Bipolar TTL is a candidate for the character set 
mechanization due to its speed and drive capabilities. 
MOS is also being considered for its high packaging 
densities and simplicity of manufacture. Use of either 
or both technologies is possible depending on system 
requirements. j 

LSI enjoys a natural advantage in the diagnostic 
field. The arrays establish replaceable units which are 
quite large, thus minimizing the degree of fault isola- 
tion required. The character set in particular has several 
features beneficial to diagnostic procedures. The bussed 
structure provides several convenient points for ap- 
plication and observation of diagnostic signals. Also, 
there are only a certain number of allowable ways to 
inter-connect characters. This, plus the fact that there 
is no intervening logic, precludes the possibility of 
unexpected timing or logic problems arising. Once the 


fault detection and isolation problems are solved rela- 
tive to a character, the solutionis applicable to all 
combinations in which that character is found. 

Furthermore, since every character is under the 
control of some micromemory, a third major approach, 
along with more traditional hardware and software 
approaches, to diagnostics becomes available. Investi- 
gations have shown that microprogram techniques are 
extremely effective in both detecting and isolating 
faults in the characters. This approach also promises 
fast diagnostic speeds. Not only are the diagnostics 
carried out at micro-instruction speeds rather than 
machine-instruction speeds, but in large machines each 
micromemory can simultaneously diagnose the charac- 
ters under its control. 

As an example, consider the application of these 
techniques to the diagnosis of the NASA MCB. Each 
of nine micromemories can simultaneously diagnose 
seven to 38 characters each. Any fault need be isolatable 
to one of only 206 characters, for which a replacement 
is chosen, assuming an operator is present, from ten 
basic part types. (Of course, the NASA MCE> actually 
reconfigures automatically ib case of error.) 

Problems currently under investigation are diagnosis 
of the micromemory itself, amount and type of hard- 
ware required, and the applicability of more conven- 
tional techniques. Goals include the development of 
techniques for 100 percent fault detection and isolation 
to the character level. 

The area of application stressed for the character 
set was computer implementation. Though the comput- 
er makes a meaningful application, there is, however, 
great economical advantage to be gained through ap- 
plication of the characters to digital equipment of 
unique or low volume design. Using the character 
methodology in such systems can reduce by large factors 
the engineering costs, design, and checkout time in- 
volved. To effectively achieve such a goal several de- 
sign aids are desirable — a character assembler., a micro- 
program assembler, and a system simulator. These 
three programs would allow for complete design auto- 
mation capability. 

The character assembler input would consist of en- 
coded instructions having the information content of 
a block diagram as exemplified by Figure 3. This in- 
formation in conjunction with the character charac- 
teristics (which form the data base of the assembler) 
is processed by the assembler to produce an output 
consisting of wiring information for the interconnection 
of the characters. The character assembler output may 
be in the form, for example, of a wire list, an N/C 
tape for automatic wiring machine, or a tape input 
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TABLE I— Estimated MOB execution times 


INSTRUCTION TIME IN M -Sec 

FLOATING POINT ADD/SUBTRACT 5.4 + A 

FLOATING POINT MULTIPLY 21.8 + A 

FLOATING POINT DIVIDE 21.0 + A 

STORE (MAIN MEMORY) 7.0 

LOAD (MAIN MEMORY) 7.7 

CONDITIONAL BRANCH 3.8 

ALL SHIFTS (REGARDLESS OF LENGTH) 5.8 

OR/AND 6.2 

DIRECT ADD 4.6 

ADD/SUBTRACT 6.8 

MULTIPLY 20.8 


A = (EQUALIZATION + NORMALIZATION TIME) < 5.2 M -SEC FOR 32 BITS. 


to a routing program for printed circuit card etch layout. 

The encoding information for the micromemory 
array is provided on tape by the microprogram as- 
sembler. This tape is used directly in the manufacture 
or alteration of the array. The microprogram code is 
assembled with the usual aids provided by machine 
language assemblers. 

System simulation would be accomplished from (1) in- 
formation of the machine structure as input to the 
character assembler, (2) the microprogram code as in- 
put to the microprogram assembler and (3) instructions 
from the system designer input directly to the system 
simulator. The degree to which system checkout would 
be accomplished would of course be dependent upon 
the sophistication of the simulator. However, because 
of the high level of definition of the characters them- 
selves the simulator would not be concerned with details 
of the Boolean logic or signal interface consistency 
between characters. Therefore a worthwhile simulator 
is seen as a feasible task. 

Thus, the complete system — microprogrammable 
characters, character assembler, microprogram as- 
sembler, and system simulator — provide the system 
designer the capability for total system design from 
his desk. Furthermore, he is not concerned with logic 
design in any form. When he specifies the following: 

1 . character configuration 

2. microprograms 

3. simulation instruction 

these item are provided for: 

1 . character assembly 


2. back panel wiring 

3. micro-array encoding 

4. system checkout 

all without the services of a logic designer or the tech- 
nician’s help. In fact, it is conceivable that no human 
intervention need take place between the system de- 
signer and his designed hardware ! 
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Fault location in cellular arrays* 


by K. J. THURBER 

Honeywell Systems and Research Center 
St. Paul, Minnesota 


INTRODUCTION 

Testing of complex integrated cellular logic circuits 
fabricated using LSI techniques has become a source of 
concern to users and manufacturers. Since an economi- 
cally feasible solution to testing problems is not visible 
for the complex arrays contemplated for the near future, 
manufacturers have acknowledged the seriousness of 
the problem. Currently some observers believe that LSI 
cannot be tested because general procedures for testing 
and diagnosing digital circuits are applicable to small 
networks of approximately 30 gates, while cellular 
arrays are contemplated as containing hundreds or 
thousands of gates on one chip. However, if arrays are 
constrained to be in a cellular form, then testing 
problems can be simplified and test schedules can be 
produced which use the interconnection structure of 
cellular arrays. 

In some cases the iterative interconnection structure 
of cellular arrays enables derivation of test schedules 
that exhibit an iterative nature, thus reducing the 
complexity of the testing problem in comparison with 
testing problems encountered in testing a noniterative 
structure containing an equal number of gates. It has 
been shown that the structure of single-rail cascades can 
be used to great advantage in the derivation of test 
algorithms for cascades 6 aaid that this testing can be 
accomplished from the edge of the cascade. These results 
are extendable to a large class of arrays. However, 
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Kautz 1 ' 2 has shown that cellular arrays exist which 
cannot be tested from their edge terminals. 

Problem definition 

The iterative interconnection structure of cellular 
arrays allows decomposition of testing problems for LSI 
cellular arrays into several subproblems. One sub- 
problem is the testing of single-rail cascades, such as the 
one shown in Figure 1. These cascades can be used in the 
production of more-complex cellular arrays, and tech- 
niques can be derived such that if a single-rail cascade 
can be tested then certain complex arrays can be tested. 
Examination of problems encountered during solution 
of the problem of testing single-rail cascades using only 
input and output terminals of cascades produces 
methods that can be used to test more-complex arrays. 
Specifically, the solution of problems involved in testing 
single-rail cascades lends insight to methods useful in 
testing cellular arrays from their edge terminals by 
computers using an average of only two or three tests 
per cell contained in the array. 

Figure 2 indicates the construction of an important 
class of cellular arrays. An example of an important class 
of arrays that has this interconnection structure is a 
cutpoint array. 4 This array consists of collector rows and 
vertical cascades. Busses extend across all collector rows 
and distribute every variable across the vertical cas- 
cades. This construction reduces the testing of this 
array to the testing of a single-rail cascade, since each 
collector row can be tested as a single-rail cascade (under 
the added assumption that both a 0 and a 1 can be 
placed on the input to each buss that extends across 
the collector rows) and each vertical cascade can be 
tested as a single-rail cascade. Output values of vertical 
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Figure 1 — Interconnection structure of cascades 



3 =; ^3= =3= =EZ pL 





mm 


r 

1 

1 

1 

t 

— 



mmm 



■cm 



a 

aa 

^Bi 

lav 

ia 





a 



|B 







IK1H 


; H - 



■ 

■■ 

■B 


U 

4 

1 

1 

1 

1 





Figure 2 — Construction of a testable cellular array 


cascades are measured at the bottom of the array 
whereas collector row output values are measured on 
the right-hand side of the array; Admittedly, it would 
be desirable to test all collector! rows (and all vertical 
cascades) simultaneously; however, to accomplish this, 
a restriction on the array structure must be made that 
restricts the class of testable arrays until the procedure 
becomes practically useless. 

Practical considerations 

Consideration of testing problems produced by LSI 
chips may help develop test algorithms that could be 
used to test today’s complex printed circuit boards. 
However, complex cellular arrays in practice will be 
more difficult to test than printed circuit boards. 
Consider that not only must exact error locations be 
indicated, but that a decision must be made based on 
the number of errors and their locations as to what can 


be done with imperfect arrays. Are imperfect arrays 
discarded or can they be salvaged in some manner? 
Alinnick 5 and Spandorfer 8 have suggested that extra 
vertical cascades and collector rows be installed at 
predetermined intervals in arrays, such as in Figure 2, 
If a vertical cascade or collector row has an error, then 
the extra cascade or row could be used to produce the 
correct function. 

Before any test procedures can be established, an 
error or circuit failure criterion must be established 
which allows definition of possible error types that may 
appear in LSI construction. In a later section an 
expanded allowable set of errors for certain types of 
cellular arrays will be presented. 

Placing an accessible test pad on an interconnection 
between cells reduces the effective area usable for the 
cells. For this reason attempts should be made to 
accomplish all testing and location of faulty ceils from 
the terminals of the array without any test pads being 
included in the array. 

A test schedule could verify the complete truth table, 
transfer function, or state table for any given device; 
however, this procedure would require too much time 
and would add greatly to the expense of the array. 
Instead of a complete verification procedure, another 
solution could be to test certain inpilt conditions on a 
probabalistic or expected utilization basis; however, 
this method is still very unsatisfactory. A feasible 
approach is to decide on a dominant failure mode from 
which a set of allowable errors can be derived for each 
cell type used in arrays under consideration. With this 
knowledge manufacturers could construct arrays using 
certain interconnection structures and could design cells 
with redundant properties. This would cause an increase 
in the probability that, if a failure occurs which is one 
of the dominant failure types, the cell error that occurs 
is a cell error that is contained in the set of allowable 
errors. 

Generation of tests and test equipment 

Redundant design, failure modes, allowable errors, 
and required confidence level contribute to the deter- 
mination of the number of tests required; however, the 
array’s structure can almost determine the number of 
tests independently of these factors. Test schedules are 
constructed to verify whether each cell is producing its 
specified function. This method of testing was chosen in 
preference to verifying an array’s truth table because 
the number of tests needed is generally much less than 
m(2 n+1 )> where m functions of n + 1 variables are 
produced. Under certain assumptions, choosing test 
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schedules capable of accomplishing the task of locating 
every error in arrays such as shown in Figure 2 is 
plausible (see Theorem 1), and these test schedules can 
be programmed for testing using digital computers. 
Because of their iterative structures, cellular arrays 
simplify problems encountered in the detection and 
location of faults. 

Since test schedules can be programmed for single-rail 
cascades, computers will be able to test many types of 
arrays with very minor software input changes. In 
particular, for the single-rail cascade under the assump- 
tions of Theorem 1, a general fault detection program 
could be written. To test a cascade the only needed input 
information would be the cell types and their location 
in the cascade. With this information the general 
program is able to test all cascades of one type. When 
the type of cascade changes, this information can be 
given the computer as input data and all cascades of the 
new type can then be tested. Because of the structural 
interconnection of arrays shown in Figure 2, no repro- 
gramming of the computer is needed when a new type 
of array appears. 

Assumptions and definitions 

Figure 1 illustrates the interconnection structure of a 
Maitra cascade. 3 Every cell in the cascade is a two-input, 
one output cell. It is assumed that the Boolean variables 
applied to the cascade are numbered as illustrated on the 
cascade shown in Figure 1. All testing of the cascade is 
accomplished using only the input leads and the output 
lead of each cascade (and of arrays). The ability to 
measure the functional value produced by a cell by 
means of probing a buss connecting two adjacent cells is 
not assumed. To minimize the “uncertainties” (the 
functional values between cells cannot be measured and 
the location of the error is unknown; therefore, the 
functional values between cells are uncertain) involved 
in testing cascades, it is assumed that cell n is tested first 
(see Figure 1), then cell n- 1, etc. If an error occurs in 
cell n-j, its propagation may be stopped by one of cells 
n- 1, n- 2, • • *, n-j + 1. Once cell n is tested, it may be 
set such that it transmits the output of cell n - 1 to the 
output terminal of the cascade. In this manner (under 
certain error assumptions) the cells may be tested in the 
following order until error location results : n, n- 1 , •••,!. 
The number of tests needed to test a cellular cascade is 
0(n) *, where n is the number of cells in the cascade. 

It is assumed that only one error (faulty cell) may 
appear in a cascade. Also, the interconnections between 
cells do not fail, the error is time independent; i.e., 


* See Definition 6. 


if cell m is in error at time t h then cell m is still in error 
at t% > h and the error type in cell m has not changed. 
Further, the input and output leads of the cascade do 
not fail. 

It is assumed that the 12 allowable cell functions for a 
Maitra cascade are fa fa fa /*, fa fa fa fa /#, /io, fa, fa, 
and/ 14 . (See Definition 1 for an explanation of the notation 
fi,) Seven allowable errors are assumed for each cell; 
these are /15 (s-a- 1 ; stuck-at-one), /o (s-a- 0 ; stuck-at- 
zero), fa- P (complementation where p is the cell 
function), /i 2 (the input X), / 3 (the complement of the 
input X), fa (the input F), and /5 (the complement of 
the input F). These seven errors consist of the two 
failure types (s-a-0 and s-a~l) usually assumed by 
most fault diagnosticians augmented by fa- P , fa, fs, /io> 
and / 5 . [Note that fa and / 5 have different allowable 
error sets; i.e., Efio = (/r, /15, fa / 12 , fs) and Ef*> = 
(/c ? / 15 , fa, / 3 , / 12 ) •] 

Definition 1. The cell functions are numbered as 
follows : 

Xi Yi-l fo fl h U /4 ft ft fl ft ft fa /ll fa / 13 fa fa 

0001010101010 1 0101 
010011001100110011 
10000011 1 1 00001111 
1 1 0000000011111111 

Definiton 2. An error occurs in a cell whenever the 
cell produces a function that is not the same as the 
function specified for that cell. 

Definition 3. G = (fa fa f\, fa fa fi, fa fa fio, fn, fa, 

AO. 

Definition 4. I v denotes (1? 2, 3, 4, ••• p). 

Definition 5. The error function E is a mapping 
from G x I n to G, where E(fa j) — A denotes that cell j 
was theoretically to produce fa,G but instead it 
produced faG. Clearly, E(fa j) = f\ indicates that cell j 
does not have an error occurring in it. 

Definition 6. X* means either X or X', but not both. 

Definition 7. O(n) means the same order of mag- 
nitude as n. 

A necessary and sufficient condition for fault 

location in cascades 

Location of a single fault in a cascade is considered in 
this section. A necessary and sufficient condition for 
location of a single fault in a cascade is proven. The 
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proof of Theorem 1 can be utilized to obtain an algo- 
rithm to locate faults in a cellular cascade or array. 

Theorem 1. Given a cascade with n cells, then the error 
can be located if and only if for every 
id n - (1) 

(1) 2£(/l4, i) 7 * f lb) f 12 

( 2 ) E(f uj i) /ioj/3 

( 3 ) E(h, i) 7* fo, fn 

( 4 ) Efai) ^/o,/a 

( 5 ) E(f$, i) t £ f 9 , /12, f 3 

( 6 ) E(f$, i) 7^ /e, /12, f 3 

(7) E(fiz y i) 7± fnl fib 

( 8 ) E(f h i) */.,/« 

( 9 ) E(U, i) /o, /12 

(10) E{f h i) *fo,fz 

( 11 ) E(f l0 ,i) 7*fo, fn,f 5 

( 12 ) E(f s ,i) ^UJoyfu 

Proof : The proof is an induction proof. Clearly, 

the theorem is tru4 for the case n = 1. 
Assume that the theorem is true for a 
positive integer k and consider a cascade 
with k + 1 cells. Gfiven the cell function 
for cell k + 1, if it ban be shown that the 
error can be located in cell k + 1 if and 
only if assumption^ (1) through (12) are 



Figure 3 — Test decision map for f i4 



Figure 4 — Test decision map for fn 


E <f 8 .k+l) 4 f 0 or f 12 



Figure 5 — Test decision map for f 8 



Figure 6 — Test decision map for f 2 



Figure 7—' Test decision map for f 6 


valid for cell k + 1, then the proof is 
complete. 

Assume conditions (1) through (12). 
This part of the proof is now completed in 
Figures 3 through 14. Note that if Co, 
Ci, • • • , Ci are used to set Y x = C at time 
h, then if F* = C is wanted at time h if 
Co, Ci, • • •, Ci are utilized again, F < is the 
same value as it was at ; however all that 
can be said about F is that it is either C 
or C', but not both. This fact is used in the 
proof of this theorem. In the figures with 
the circled function number it may be 
necessary to add one more test to deter- 
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E<f 9 .k«Mf 3 , f 6 , orf 12 



Figure 8 — 'Test decision map for f 9 



Figure 9 — Test decision map for f w 



Figure 10 — Test decision map for f 7 



E (fj , k+1) ^ f Q or fg 



Figure 12 — Test decision map for fi 



Figure 13 — Test decision map for fi 0 



Figure 14 — -Test decision map for f 5 

mine whether the cell is in error or is 
receiving the complemented sequence. 

The proof of the other half of the 
theorem will be by contradiction. Assume 
that the error can be located, but that the 
restrictions (1) through (12) are not 
needed. Then it can be verified that the 
following pairs of conditions give the same 
output at the cascade’s terminal. Since the 
two conditions give the same outputs, the 
error cannot be located, which is a con- 
tradiction of the assumption; therefore, 


Figure 11 — Test decision map for f 4 
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the assumption that the restrictions are 
not needed is incorrect and the proof is 
completed. After (1) an abbreviated nota- 
tion is used. Note: Using the Test 
Decision Maps and the contradiction part 
of this proof one can actually determine 
the values of F*_i. 

(1) Yk — 1,1,1 and E(fu, k + 1) = fu 
are equivalent to Y k = 0, 1, 0 and 
E(fu, k + 1) = /15 at the cascade’s 
output terminal. 

Yk = 0, 0, 0 andl?(/i4, k + 1) = fu 
are equivalent to Y k = 0, 1, 0 and 
E(fu, k + 1) = /12 at the cascade’s 
output terminal. 

(2) Yu = 0, 0, 0 and E(fn, ft + 1) — jTn ; 
Y k = 0, 0, 1 and E(fn, ft + 1) = 

Y k = 1, 1, 1 and E(f n , ft + 1) = f u ; 
Y k = 0, 0, 1 and E (f n , ft + 1) = / 16 . 

(3) Y k = 1, 1, 1 and E(f Sy ft + 1) = / 8 ; 
Yk = 1, 0, 1 aiuji E(fg, k + 1) = / 12 . 

Y k = 0, 0, 0 aind E(f s , k + 1) = / 8 ; 
Y k = 1, 0, 1 and E(f s , k+ 1) = / 0 . 

(4) Y k = 1, 1, 1 and tf(/ 2 , ft + 1) = / 2 ; 

= 0, 1, 1 arid E(f 2 , k + 1) = 

F* = 0, 0, 0 and E(f 2 , k + 1) = /*; 
Fa = 0, 1, 1 and E(f 2 , k + 1) = /<>. 

(5) Y k = 1, 1, 1 and S(/«, ft + 1) = / 6 ; 
F ft = 0, 1, 0 and E(f„ ft + 1) = 

Fa = 0, 0, 0 and E(f 6 , ft + 1) = / 6 ; 
Fa - 0, 1, 0 and E(f t , ft + 1) - U 
Y k = 1, 0, 1 and £(/*, ft + 1) - /«; 
Fa - 0, 1, 0 and E(f e , ft + 1) - / 9 . 

(6) Fa = 1, 1, 1 and E(f 9 , ft + 1) - / 9 ; 
Fa = 0, 1, 0 and E(J 9 , k + 1) = 

Yk — 0, 0, 0 and E(f 9 , ft -f- 1) — / 9 ; 
Fa = 0, 1, 0 and i£(/ 9 , ft -f- 1) = / 3 * 
Fa = 1, 0, 1 and E(f 9 , ft + 1) = / 9 ; 
Fa = 0, 1, 0 and E(f h ft + 1) - /„. 

(7) Fa =1,1,1 and ^(/ka, ft + 1) = /«; 
Fa = 0,1, land^(/i 3 , ft+ 1) = / u . 

Fa = 0, 0, 0 and E(f lz , ft + 1) = / 13 ; 
Fa = 0, 1,1 and E(f±s, ft + 1) = /i B . 

(8) Fa = 1, 1, 1 and E(f 7 , ft + 1) = / 7 ; 
Fa = 1, 0, 1 and E(fi, ft + 1) = / 3 . 

Fa = 0, 0, 0 and E(fi, ft + 1) = / 7 ; 
Fa = 1, 0, 1 and E(J 7 , ft + 1) = / 15 . 


(9) Fa = 1, 1, 1 and E(f A , ft + 1) = / 4 ; 
Fa = 0, 0, 1 and E{f A , ft + 1) = / 0 . 

Fa = 0, 0, 0 and #(/ 4 , ft + 1) = / 4 ; 
Fa = 0, 0, 1 and E(j A , ft + 1) = /i 2 . 

(10) Fa = 1, 1, 1 and j&C/i, ft + 1) = h) 
Yk = 0, 1, 0 and E(f h ft + 1) = f 0 . 

Yk = 0,0,0 and E(f h ft + 1) = A; 
Fa = 0, 1, 0 and E(f h ft + 1) = / 3 . 

(11) Fa = 1, 1, 1 and E(fi 0 , ft + 1) = / 10 ; 
Fa = 0, 1, 0 and E(f w , ft + 1) = / M . 

Fa = 0, 0, 0 and E(f\ 9 ,k + 1) = /lo) 
Fa = 0, 1,0 and E[f 10 , ft + 1) = /o. 

Fa = 1, 0, 1 and E(f 10 , ft + 1) = /i 0 ; 
Fa = 0, 1, 0 and E(f 10} ft + 1) = U 

(12) Fa = 1, 1, 1 and E(f b , ft + 1) = f 6 ; 
Fa = 0, 1, 0 and E(f 5 , ft + 1) = / 0 . 

Fa = 0, 0, 0 and E(f 5 , ft + 1) = /sj 
Fa = 0, 1, 0 and tf(/ 6) ft + 1) = /«. 

Fa = 1, 0, 1 and (f 5 , ft + 1) = /#; 
Fa = 0, 1, 0 and E{f h ft + 1) = /io- 

If the cascade meets the assumptions of Theorem 1, 
then Theorem 1 can be used to determine test schedules 
for the location of an error in cascades. It should be 
noted that when cell ft is tested, one obtains information 
about the cells ft— 1, ft — 2, ***,1, and therefore a test 
schedule with 0 (n) tests will test any cascade with n 
cells under the allowable error set 6 . Clearly, if the 
conditions of Theorem 1 are relaxed, then fault detection 
(and maybe isolation) can be accomplished in the same 
number of tests; however, if one is only interested in 
fault detection, Theorem 2 is the best technique to use. 

If a more complex cascade than the cascades con- 
sidered here is under consideration, then a good 
understanding of the method used to derive the 
theorems in this paper will allow one to extend the 
theories presented. If the cell functions /o, /s, fi 2 , and /15 
are allowed, then the fault techniques may be easily 
extended since none of these functions depend on the F 
value; however, one must exercise care in the use of the 
theory because it is based on the ability of the tester to 
place theoretically both a 0 and a 1 on the F inter- 
connection, and examples (trivial) in which this cannot 
be accomplished do exist. 

Fault detection in Maitra cascades 

In this section the detection of a single fault in a 
cascade is considered. The theory for this section is 
based on the observation that every n cell Maitra 
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cascade (as defined in this paper) produces a. function 
dependent on I 0 6 . 

The purpose of this detection scheme is to utilize 
exactly two tests to detect whether a cascade has a 
faulty cell. 

Theorem 2. Let the Maitra cascade have n cells. If C\ 
C 2 , • • •, C n are such that/(X 0 , Ci, C 2 , • • •, 
C n ) = X 0 *, then 

(1) /( 1, C h Cn) - Jf(0, Cl, Cn) 
implies that there exists a cell i such 
that E(J P , i) = /o, /lb, /12, or/ 3 . 

(2) /( 1, Ci, Cn) = (1*)' and /(0, 
Ci, • • •, C n ) = (0*)' imply that there 
exists a cell i such that E(f P) i) = 
/i6 — p or /$. 

(3) /( 1, Cl, •••, Cn. \ = 1 * and /( 0, 
Ci, • • *, C n ) = 0* imply that there is 
no error in the cascade or that there 
exists a cell such that E(f p , i) = /10 
and p 7^ 10. 

Proof: In part (1) / does not depend on Xo; 

therefore, there must be a cell i such that 
E(f P , i) = /o, /lb, /12, or / 3 . In part (2) / 
depends on (X 0 *)'; therefore, there is a 
cell i such that E(f p , i) = /i 5 - P or / 5 . 
Whereas, the proof of part (3) is now 
obvious. 

X 0 was chosen as the variable to be used in Theorem 2 
because of the symmetry of the resulting theorem. 
Since Xi can be made (by a suitable choice of constants, 
to pass theoretically through every cell*, the theorem 
could be rewritten in terms of Xi. In terms of the 
complexity of the detection scheme it is seen that 
cascades could have a very simple detection test 
schedule. It should be noted that Theorem 2 can very 
easily be adapted to provide fault detection in cascades 
if it is assumed that /10 is not an allowable error for any 
of the 12 cell functions. 

Examples 

This section consists of examples of the use of 
Theorems 1 and 2. f A denotes the measured value of 
/ whereas f T denotes the theoretical value of /. 

* Assuming the cell function for cell 1 is not fio or fs. 


Example 1. Assume that there is no error in the 
cascade shown in Figure 15. 

Test 

X 0 Xi X2 X 3 X 4 f T f a Conclusion 

0 0 0 1 0 0 0 

0 0 110 11 
0 0 0 1 1 1 1 E(ft, 4) = /e 

0 0 1 0 0 0 0 E(fs, 3) = fs 

0 10 10 11 E (/14, 2 ) = /14 

10 0 10 11 E(f u , 1) = fu 

Example 2. Assume that E(f 8 , 3) = /15 in the cascade 
shown in Figure 15. 

Test 

Xo Xi X 2 X 3 X 4 f T f a Conclusion 

0 0 0 1 0 0 1 

0 0 110 11 

0 0 0 1 1 1 0 X(/ 6 , 4) = / 6 

0 0 1 0 0 0 1 E(f 8 , 3) = f u 

Example 3. Assume that E(fu, 2) = / 3 in the cascade 
shown in Figure 15. 

Test 

Xo Xi X 2 X 3 X 4 f T f A Conclusion 

0 0 0 1 0 0 1 

0 0 110 10 

0 0 0 1 1 1 0 E(u, 4) = U 

0 0 1 0 0 0 0 E(fs, 3) = / 6 so 

an extra test is 
needed. 

0 0 0 0 0 0 0 2?(/ 8 ,3) 5^/5 and 

the complemen- 
ted sequence 
Y 2 is being 
received. 

0 10 10 11 E(J U , 2 ) = fs 

Example 4. This example satisfies the hypothesis of 
Theorem 2. Assume that E(f s , 4) = / 0 for 
the cascade shown in Figure 15. 

[(Xo + Xi + X 2 ) X 3 ] (0 X 4 = /r(X 0, Xi, X 2 , X 3 , X 4 ) 

/t(X 0 , 0, 0, 1, 0) = Xo 

f A ( 0, 0, 0, 1, 0) = f T { 1, 0, 0, 1, 0) = 0 implies that there 
is a cell i such that E(f P) i) = / 0 , /15, /12, or/ 3 . 
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Figure 15 — A cascade to; be tested 


CONCLUSION 

Techniques for fault location and detection in cellular 
arrays with an allowable error set of /o, /15, fu- P) fz, /12, 
jfs, or/10 were described in this paper. It was shown that 
the problem of testing an array could be reduced to the 
problem of testing a cascade. The solutions presented 
are particularly attractive because of their simplicity. 
To locate an error, 0 (n) tests are needed for an n cell 
cascade. Detection of an error requires only two tests 
if the allowable error set is reduced by one error (/10). 

A necessary and sufficient condition for single-error 
location was given. If the restrictions of this condition 
are relaxed, then an isolation theorem such as given by 
Thurber 6(7 can be derived; however, this isolation 
condition will be more complex than the theorem given 
by Thurber 6 * 7 . A criterion that enables detection of a 
single error in only two tests was! derived. 

Although the theories presented were derived for 
regular arrays of logic, they have potentially wide areas 
of application. A good understanding of the philosophies 
presented here will allow the extension of the results to 
cascades of m input n output cells. Also, some irregular 
arrays may be tested using this theory if they can be 
decomposed into sections composed of some form of a 
cascaded structure (or sections composed of structures 


closely resembling a cascaded structure). 
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INTRODUCTION 

The inherent capabilities of Large Scale Integration 
technology have recently shifted attention toward two 
major concepts in the design of functional computer 
subsystems; the concepts of Functional Modules and 
Cellular Arrays. 

The Functional Module concept emphasizes the 
possible standardization of frequently used common 
digital subsystem units such as registers, adders, 
counters, etc. Because of the unique iterative proper- 
ties also displayed by these units it is common to view 
them as building blocks (functional modules), built 
on a single substrate of material, the interconnection 
of which can expand significantly their functional 
capabilities. In addition to standardization, their 
massive production may suggest low cost subsystems. 

The Cellular Array concept allows the interconnec- 
tion of several types of mutually independent logic 
blocks, the cells, in various geometric configurations 
to perform a desired operation. 

This paper is an attempt to combine the above two 
approaches in the realization of a Binary Cellular 
Array multiplication unit easily adaptable to the 
LSI realization techniques and speculate the possibili- 
ties of the realization of other similar such functional 
units aiming to lower the cost per unit of computa- 
tion and possibly increase the overall system reliability. 

Multiplication was chosen in the study because it 
forms the basis of division and square root operations 
by iterative methods as well as others indicated by 
design trend of present day computing systems. 


The methodology and retroactive design procedures 
of the Multiplication Array are presented. Intercon- 
nection arrangements at the cell level, for the array 
formation, as well as the module level by. bringing all 
module inputs and outputs at the terminals of the 
“package”, for the purpose of assembling larger mul- 
tiplication units, are also shown. 

Since in any LSI circuit testing imposes a complex 
problem some diagnostic schemes are suggested for 
reconfiguration and operation under reduced capabili- 
ties or even by automatically switching in of a per- 
manently connected spare module. 

Other LSI considerations in terms of cell or module 
fan-in/fan-out, total number of pins required per 
package, chip sizes and densities and rough cost es- 
timates are also discussed. 

Single bit multiplier 

Figures 1 and 2 show the integral parts and the de- 
tailed cellular array structure of the multiplication 
unit, in which each row of the array corresponds to 
one bit of the multiplier. The array uses K-bit operands 
producing 2K bit product. 

To achieve fast execution time the multiplication 
is done by performing K-l carry save additions (simple 
EXCLUSIVE-OR operations) followed by a full 
binary addition. Since the cells in the array operate 
asynchronously, the unit as a whole can operate faster 
without using a clock pulse. 

We shall next explain the single-bit multiplication 
unit in some detail. 
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Figure 1 — -The integral parts of the asynchronous 
multiplication array 

Let the multiplicand be represented by the binary 
vector M = (mi, m 2 , ■ • • m*) and the multiplier by the 
binary vector N = (n h n 2 , • • • n*). 

A kx2k, P matrix is now generated starting from right 
to left (whose elements p*y are computed from the 

relation p^ = m* • n„ Pije {0, 1} with the following 

conditions 

(l < i < k for i = 1, 
j 2, 3, . . .k 

my_ t+1 if n i = 1 and/or Ji — 1 < j < k + 1 

| for i = 1, 2, 3. . .k 

P»y — 

[l < i < k for i — 1, 2, 3 

j ••• k 

0 if n* = 0 and/or f 

k + 1 < j < i — 1 for 
i = 1, 2, 3. . .k 

In terms of the array to be implemented, this condition 
implies that for the range “i,” “j” where p t y = 0 no cell 
will be required to perform a logi<5 function. Thus the 
[P] matrix has the following form : 

Pi, 2*— 1 Pl,2A 2 • . • Pi, A - • • Pl3 Pl2 Pll 

P2 ,2k 1 P2 f 2A“2* * * P 2 , A * ■ * P23 p22 P21 


Pma-i Pa,2a-2 Pa, a Pa, 3 Pa, 2 Pa,i 



Figure 2 — The “single-bit” asynchronous multiplication 
cellular array 


The following example will illustrate the above 
matrix formation. 

EXAMPLE 

M = (10101) and N = (111111) 

MULTIPLY... 

then the P matrix isP= 00001 0 101 

0 0 0 1 0 10 1 0 
0 0 10 10 10 0 
0 10 10 10 0 0 
1 0 1 0 1 0 0 0 0 

The above matrix can be realized by selective AND- 
ing of components of M and N. This “Shifting Net- 
work” accomplishes the proper positioning of the 
numbers to be added before their addition, just as in 
the conventional multiplication. Arrays of Carry 
Save Adders are used to perform the addition of these 
binary numbers utilizing Wallace's algorithm. 1 

The first stage of the Carry Save Adder adds the 
first two rows of the P matrix (first two generated 
partial products) thus generating two vectors — the 
first partial sums and the first carry having the form: 

8 = (Si, 2A— 1 Si, 2A— 2 ■ . -Si, k - • .Sn) 

c = (ci, 2k- 1 Cl, 2 A— 2 • . .Ci, k • • . Cn); s ij, c„e{ 0 , 1 } 

The double subscript is used to identify the above 
vectors with corresponding positions of the P matrix 
that contributes to their generation. 
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The logic functions yielding the elements s 2 y and 
c 2 y are: 

S2; = Pljp2; + pi; P2; 

C2; = Pl;*P2; 

where j = 2, 3,***2k — 1. The composite cells are 
shown in Figure 3a. 

In the subsequent stages the Carry Save Adder will 
add three vectors: The sum vector generated at the 
previous stage, the carry vector generated at the 
previous stage shifted once to the left and the next row 
vector of the P matrix. 

The logic functions producing the new s and c vectors 

S = (s*, 2/fc-l Si, 2 £_2. . .S», k. . .Sii) 

C = (C», 2k — 1 C i, 2k — 2 . . .C i, k- • •C»l) 

for i = 3, 4, . . ,k, and j = 1, 2, 3. . .2k — 1 are: 

Ci+l,y ” SijCi, “)" S ?!;P i-j-lj; + Ci,y_iPi_|_i,y 

®*+l,y S»;'P ;C», y — \ Syypy-j-i, yC y, j — l H - 

”F Syypi_)_i,yC ijj—l + S»yP*+i>y— 1 

The composite cell ‘C* is shown on Figure 3b. After 
the Carry Save Addition has been performed for all 
the partial product row T vectors of the matrix P, a 
Ripple Binary Adder is used to add the sum and carry 
row vectors of the last stage of the Carry Save Adder. 
The typical cell of this Ripple Binary Adder has the 
same structure as “cell C” of the Carry Save Adder, 
except that it ripples through the carries generated to 
next high order position and puts out the correct 
binary sum which of course involves any carry inci- 
dent into it from the previous stage. The output of 
the Ripple Binary Adder is the final product of the 
multiplication. 

The superposition of the “Shifting Array”, the “Car- 
ry Save Adder” and the “Ripple Binary Adder” re- 
sulting in the “Single Bit Multiplication Cellular 
Array” is as shown in Figures 1 and 2. 

It was found that with a Cany Save Adder there is 
considerable gain in the time propagation over the 
choice of Full Binary Adder. Assuming a uniform delay 
d for each cell in the array, the total execution time 
T P of k bit by k bit multiplication is bounded between 
the limits (k-1) d < T p < 2kd. The lower limit (k-1) 
d is the total delay in the Carry Save Adder while the 



Figure 3a, b— Cell “S”, Cell “C” 


upper limit 2kd depends on the choice of the device 
for the final Full Binary Addition. This, as compared 
to the maximum delay requirement in a conventional 
multiplier due to k(k-l) d full binary additions plus 
k-single bit left shifts. The asynchronous multipli- 
cation array, as implemented is shown in Figure 2. 

Two-bit multiplier 

Upon examination of this array it was decided that 
the time propagation and therefore the computational 
speed could be further improved by reducing the Cany 
Save Adder stages, in other words, the rows of the 
array. This also improves the attenuation factor of 
the cell inputs as they ripple throug the array. 

An alternate multiplying algorithm, examining the 
multiplier bits in subsets of two, was investigated re- 
sulting in the block diagram of Figure 4 which dis- 
plays the integral parts of the modified array. To il- 
lustrate the algorithm better, this was assumed to be 
an m X n instead of a square array and the multiplier 
parts now are : 

1. The m + n + 2 — bit register for the multiplicand 

2. The n + 2— bit register for the multiplier 

3. The m + n + 3 — bit registers for the final 
product 

4. The Binary Shifting Array (BSA) 

5. The Input Control Circuit (ICC) 

6. The Carry Save Adder (CSA) 

7 . The “End Around Carry , ’ Accumulator (E AC A) . 

Before investigating the above circuits the general 
algorithm concept must be established. This algorithm 



M+N+2 
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) M+N+2 1 



Figure 4— The modified “two bit” multiplier 


calls for three types of decisions in each multiplication 
stage: ADD or SUBTRACT a single multiple' of the 
multiplicand and SHIFT without generating any mul- 
tiples of the multiplicand. This is opposed to the 
conventional multiplication which requires only shifts 
of the multiplicand and their addition. For the pos- 
sible four 2-bit combination one has the following 
obvious interpretation: 

a. Combination 00 2 = 0 10 Add nothing to the 
partial product 

b. Combination 01 2 = li 0 Add one times the mul- 
tiplicand to the partial product 

c. Combination 10 2 = 2 10 Add two times the mul- 
tiplicand to the partial product 

d. Combination 11 2 = 3i 0 Add three times the 
multiplicand to the partial product. 

Combinations (a), (b) impose no difficulty in their 
generation. Combination (c) requires a 1-bit shift of 
the multiplicand to the left according to the obvious 
simple fact: to generate any 2 w -th multiple of a binary 
number (the multiplicand in this I case), where n is any 
integer n > 0, shift the number jn-bit position to the 
left. For example, to generate 16 Xm = 2 4 Xm, shift 
m four bit positions to the left. For combination (d) 
one notices that the multiplicand can be expressed 
in the following two ways: 

(1) (4Xm) - (lXm) (2) (2x m) + (lXm). 

The first representation was chosen for this multi- 
plication algorithm, according to which a comple- 
mentation (2's complement) of one times the multi- 
plicand is performed and added to the corresponding 
present stage of the multiplication array while a re- 


quest is issued to add one times the multiplicand in 
the following stage in that order . The latter request is 
taken care of by adding “1” to the bit pair of the 
multiplier corresponding to the next multiplication 
stage, thus increasing the pair's integer value by one. 
This is commonly known as a “carryout.” 

The subtraction of multiplicand from the partial 
product is performed in two stages. The one's comple- 
ment of m is “added” into the Carry Save Adder of 
the row. A “one” in the lowest order bit position cor- 
responding to the row is generated and inserted into 
the End-Around-Carry Accumulator (EACA), at 
the appropriate column. Together this constitutes 
adding the two's complement of m after appropriate 
shifting. Thus any sequential borrow propagation is 
prevented at the Carry Save Adder stages. Since the 
“End-Around-one's” if generated by any or all rows 
are inserted at distinct columns of the EACA the 
latter performs at most one accumulation during a 
complete multiplication cycle. It must be remembered 
that partial products generated at each row are bussed 
to the next row of cells with a 2-bit left shift. 

The following two tables indicate the decisions that 
have to be made when the various bit pair combina- 
tions are encountered at a given stage when no carry- 
out (Table I) or a carryout (Table II) has been gen- 
erated in the previous stage. 5 


Multiple of Multiple of 

Bits m generated Carryout Bits m generated Carryout 


00 

0 0 

00 

1 

0 

01 

1 ■ 1 

01 

2 

0 

10 

2 2 

10 

-1 

1 

11 

-1 -1 

11 

0 

1 


Table I 


Table II 


Finally 

to illustrate the 

overall performance 

of the 

modified multiplier with 

minimum 

effort an example 


of a 4-bit positive multiplicand times a 6-bit positive 
multiplier producing a 6 + 4= 10-bit long product 
is presented. The extension of the algorithm and 
techniques involved can be easily extended for an 
arbitrary bit length multiplicand or multiplier. 

The binary shifting array 

The BSA generates the elements p# e(0,l) and such 
that 

Pi,- = 0 for l<j<m+l;i^>3 
j < m + n; i > 3 
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where 

m = no. of bits in M 
n = no. of bits in X 

with the rest of the p*/s varying according to the 
corresponding multiplicand bits. Its implementation 
procedure is as follows: 

1. Provide for one additional bit pair at the most 
significant part of the multiplier by inserting 
two zeroes in the register. This will take care 
of a possible generation of a “carryout” at the 
two most significant bits of the multiplier. Pro- 
vide for as many zeros to the left-hand side of 
the multiplicand register to make it (m + n + 2) 
bits long. 

2. Examine the multiplier bits two at a time from 
the least significant to the most significant bits. 

3. Generate the following three numbers for each 
multiplier bit pair: 

a. The multiplicand 

b. The multiplicand inverted (one's comple- 
eomplement) 

c . Twice the multiplicand. 

Repeat for the next bit pair until all n-multiplier bits 
are used. For this particular example the proce- 
dure will yield the formation of possible CSA inputs, 
where the “boxed in” numbers will be the rows of the 
P matrix chosen by the control lines of the ICC (Fig- 
ure 5a). 

The two numbers are placed in the registers with the 
least significant bit of the multiplier starting at the 
top. For every bit pair of the multiplier there is a 
corresponding triplet of “AND” gate rows and one of 
inverters, all together being capable of generating any 
of the desired forms of the multiplicand called for in 
Tables I and II. 

The “AND" gates have two inputs and one output, 
one of the inputs being a multiplicand bit bussed 
across and the other being the appropriate line acti- 
vated by the ICC. The outputs of the leftmost column 
are used to keep count of the “End Around Carries” 
and are directly connected to the appropriate positions 
of the EACA. 

The input control circuit 

The ICC is a column of (n/2 + 2) rectangular cells 
(see Figure 4). Its operation is to select the appropriate 
multiplicand multiple for each possible bit pair com- 
bination, by the way of three output lines: L 2 , L>, L 8 . 


10 11 Multiplicand 

0 0 10 10 11 Multiplier 



Figure 5a — Multiplication example 



Figure 5b — Cell-K of the input control circuit 
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L x activates the single multiple of the multiplicand (first 
“AND” gate row of each group of rows in the ESA). 
L 2 activates the 2 ; s complement of the multiplicand 
(second “AND” gate row, directly under each row of 
inverters). L 3 activates the double multiple of the 
multiplicand. Therefore, the typical cell of the ICC has 
Bi, B 2 , and C 0 as inputs and Li, Ii 2 and L 3 as outputs. 
Its logic functions are shown below. Bj and B 2 are any 
two consecutive bits and C 0 is the Carryout. The logic : 

M M 2M 
B x B 2 Co Lx L 2 L 3 

0 0 0 0 0 0 

0 0 110 0 
0 1 0 0 0 1 

0 110 10 

10 0 10 0 
10 10 0 1 

110 0 10 

1 1 10 0 0 

Note: The interpretation of B h B 2 = 01 is notone times 
the multiplier as it would obviously appear, but it is 
instead two times the multiplicand because of the way 
the multiplier is plaqed in the register, vertically with 
the least significant bit on the top. The Bi, B 2 = 10 
combination is interpreted in a similar manner 

Li = B 1 B 2 C 0 + B 1 B 2 C 0 

U — BxB 2 C 0 + BiB 2 Co 

L 3 = B^Co + BiB 2 Co 



Figure 6 — The binary multiplying cellular array 


The typical cell “K” of the ICC is shown in detail in 
Figure 5b. 

The carry save adder, end around carry 
accumulator and full binary adder 

A layout of the inputs to the CSA stages, the EACA 
and FBA is displayed below. The groups of binary 
numbers between the lines represent the actual inputs 
to a particular row of cells. The first three groups are 
CSA row inputs. The fourth group represents the EACA 
inputs and the final group, those of the FBA. All binary 
numbers representing partial products are of course 
P matrix row vectors activated by the ICC lines due 
to a particular multiplier bit pair combination. 

11111110100 1st partial product 
111110100 2nd partial product 

00000100100 1st partial sum 

1 1 1110100000 1st carry 

1 1 1 0 1 0 0 3rd partial product 

100011000100 2nd partial sum 
0111001000000 2nd carry 

0 10 11 4th partial product 

010001000100 3rd partial sum 
10101 1 0000000 3rd carry 

0 111 End Around Carries 

1 000111010001 4th partial sum 
0100000001000 4th carry 

1 100111011001 Final Sum (Result) 

Figure 6 shows array after superimposing the in- 
dividual circuits. 

It can be easily noticed that there is a reduction 
by a factor of two in the total number of cell rows re^ 
quired for the array and therefore in the total final 
propagation T py at the expense of some additional 
control logic, a number of inverters and an additional 
stage for the EACA. No further complexity in the 
cell structure results, thus the originally developed 
cells were used, with a minor modification for cell S 
as shown in Figure 7a. This cell may also be present 
in the single bit multiplication array. 

It must also be noticed that the overflow of bits 
resulting in the left-most significant part of the final 
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Figure 7a— Cell “S”— A form of Cell “S” 



Figure 7b — Cell “R” — Reconfiguration cell 

product register may be advantageously utilized for 
sign and decimal point considerations. 

Diagnostics and reconfiguration 

In order to incorporate diagnostics in the array 
and study the interconnection problem, a standard 
size module had to be assumed. It was felt that the 
implementation of a 64 X 64 bit multiplier would be 


a good choice for all practical purposes. An intercon- 
necting scheme of standard dimension 64 X 8 bit 
modules to realize the 64 bit multiplier was then de- 
vised aiming to minimize the number of pins per 
module necessary for the interconnection. 

As seen in Figure 8, the resulting 64 X 64 multi- 
plication unit requires 2-Full Binary addition stages 
and 4-Carry Save addition stages per module, a 
total of 32-Carry Save additions and 15-Binary Addi- 
tions (only one for the first module). However, there 
is a real time overlap between these various stages, 
and by utilizing a pipelining technique and a series 
of flip-flops after each FBA, a 100 percent utilization 
of the unit during computation is achieved, and the 
multiplication cycle is considerably faster. This is 
illustrated shortly in connection with Table III. 

The basic module as displayed in Figure 6 has to be 
modified further for the interconnection. An extra 
FBA and additional gating for diagnostic purposes is 



Figure 8 — Example of an assembled 64 X 64-bit 
multiplication unit using the pipelining scheme 
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introduced in every module between the output of its 
respective FBA and what is shown as a product 
register. The typical newly developed cell for the 
diagnostics and reconfiguration is shown in Figure 
7b, while the above mentioned modifications are dis- 
played in detail in Figure 9 for a typical module. 

As seen, three additional control lines are needed 
to perform the following functions. 

a. To relay a Fault or No-Fault signal, indicating 
that a fault has or has not occurred in one par- 
ticular module (NF/F) (e.g., if F = 0 NF =1). 

b. To relay a No Shift signal for the output of this 
module, (NS = 1) if no fault has occurred in 
the preceding module. 

c. To relay a shift, eight-bits to the right, (S = 1) 
for the output of this and all subsequent modules 
if a fault has been detected in the preceding 
module. 

The detection of the fault could be accomplished by 
a software routine which may check the final product 
of the unit periodically and appropriately set the flip- 
flops of the control signals. 

By shifting the outputs of all subsequent modules 
to the malfunctioning one eight-bit positions to the 
right while forcing the output of the faulty module to be 
equal to zero at the same time and simultaneously 
introducing the spare module which is permanently 
connected to the unit, one can still achieve 100 percent 
computational efficiency. If another module fails to 
function properly, by applying again the same recon- 
figuration scheme the unit will function with a reduced 
capability since the eight-least significant bits of the 
multiplier will be lost. No provision has been made at 
this point if two modules fail to function properly 



Figure 9 — The combinational logic gating 
for reconfiguration 


at the same time. At least one of them must be replaced 
to put the multiplication unit back in service. 

Aiming to maximize the number of multiplications 
per unit time, as already mentioned, one can introduce 
storage elements at intermediate points. This allows 
the unit to accept a new set of operands without waiting 
for the total completion of the present computation. 

Consider an m X m bit multiplier module. If the 
intermediate computations are stored after the Carry 
Save adders, the first Binary adder and the second 
Binary adder, the rate of multiplications in the module 
per unit time will be 

= - — where 

max [t cs , tbj 

t C8 = Total time propagation through the CSA. 

t 6 = Total time propagation through the FBA for 
the binary addition of two m-bit binary 
numbers. 

Then the number of storage elements required per 
module is 2m + m + m = 4m. If, however, storage 
elements are inserted at the outputs of the two Binary 
Adders only, as shown in Figure 8, the maximum rate 
of multiplications in each module per unit time will be 

r> _ 1 

Umax T , ^ 

-f— 

while the total number of storage elements required 
will be decreased by half, that is 2m. 

The table below gives the sequence of events in 
the first four modules of the 64 X 64 composite mul- 
tiplier unit of eight modules, based on the pipelining 
technique. 

Table III 
MODULES 

TIME UNITS 12 3 4 


1 

B u 

Bn 

Bu 


Bn 

2 

B21 

B2I, B12 

Bu 


B'21 

3 

Bsi 

B31, B22 

B3I 


B 43 

4 

B41 

B41, B32 

U4I, 

B23 

B41, 

5 

B51 

B51, B 4 l 

Bgi, 

B33 

B51, 


Each time unit in the above table corresponds to the 
factor t b + tcs, and B</ represents the j** binary 
addition of the i th multiplication. 
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Figure 10 — An alternate interconnecting scheme for 
the 8-modules of the 64 X 64 multiplication unit 


Another interconnecting scheme which has not been 
investigated yet in detail but seems to be equally as 
efficient, considerably faster and adaptable to the 
proposed reconfiguration technique is the one shown in 
Fig. 10, where each level of nodes represents FBA’s 
Figure 10, where each level of nodes represents FBA’s 
performing in parallel with an anticipated multiplication 
cycle of 

[1 + l0g2n] tb + t C8 . 

LSI implementation 

The implementation shown for the 64 X 8 module 
reveals a number of characteristics suitable for large 
scale integration. Among them are the repetitive 
interconnections of simple identical cells and the 
modularity suitable for expansion and reconfiguration. 

Below some of the approximate hardware require- 
ments are pointed out. 

Approximate number of PINS/MODULE 

1. m + n + 2 needed for the multiplicand register 

2. m + n + 2 needed as inputs to the second FBA 

3. m + n + 2 needed for the product 

4. n + 2 needed for the multiplier register 

5. three-control pins for reconfiguration 

Approximate number of CELLS/MODULE 

The cells are the kinds already discrussed: C, S, 
S', R, K. All are present in a module. 

1 . m X n/2 cells needed for the CSA stages 

2. m + n cells needed for the EACA stage 

3. m + n reconfiguration cells 

4. 2 (m + n + 2) cells needed for the two FBS's 

o. n/2 + 1 cells needed for the ICC. 


Approximate number of GATES/CELL* 

For cell “C” approximately seven-gates are required 

For cell “S”, “S'” approximately three-gates are 
required 

For cell “R” approximately two-gates are required 

For cell “K” approximately nine-gates are required 

The above estimates point out the fact that testing 
at the individual cell or circuit level (item yet to be 
examined) becomes a problem, especially when the 
complexity of the chip is increased, with a paralleled 
decrease in reliability and yield of non-defective chips. 
However, using the modular approach it is advisable 
to perform the testing externally on the module and 
discard the malfunctioning units. This would consider- 
ably decrease the amount of logic on a chip, which would 
otherwise have to be inserted for the testing of the 
individual circuits. This approach seems to be eco- 
nomically feasible since it is estimated that by 1970 
an LSI chip of 100 X 100 mils in size may contain 
200 components, at five cents per component, while 
by 1975 an LSI chip of 300 X 300 mils in size may 
contain as many as 3,600 components at the cost of 
about one cent per component. Therefore, miniaturi- 
zation of LSI chips will discourage the testing on the 
individual circuit level, while the loss due to the 
discarding of modules after tesing at the frame level, 
will be negligible. 

In view of the above considerations and since the 
present state-of-art high density MOS circuits are 
being driven at 10 MHz, implementation of the 
multiplier modules as the one presented by MOS cir- 
cuits appears very desirable from a manufacturing 
viewpoint. A reasonable building block might be a 
64 X 64 bit multiplication unit requiring an approxi- 
mate number of 5000 active elements (field effect 
transistors). One could also visualize the whole unit 
incorporated in one or two chips. Where speed is the 
primary requirement, the unit can be designed using 
fast bipolar transistors, with an expected five ns delay. 
Assuming then a 64 X 64 bit module is implemented 
by bipolar transistors, the execution time could be 
in the neighborhood of 0.225/xs, which when pipelined, 
the maximum number of multiplications per second may 
be approximately 5 X 10 6 . An MOS array of the same 
module will perform in an order of magnitude slower 
than in the bipolar case. 


* The above gates are mostly “AND” gates with the “OR” gate 
not included in the count. They are also 2(m + n) additional 
gates needed for the reconfiguration scheme and m X n gates for 
shifting each array. 
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The pin count also indicates that the current design 
is within the state-of-art of the MOS technology. 

The performance figures given above are educated 
guesses since the circuit and intermodule delays are 
dependent on the circuit types, their interconnections, 
the chip topology, etc. In addition^the design examples 
described in the previous sections indicate the ease 
with which the array could be| partitioned to fit 
reasonable unit or chip sizes. 

CONCLUSION 

Since fast multiplication has become the basis of 
iterative divisions and square roots in fast computers 6 * 7 
there appears to be a need for cheap array type, LSI 
realizable multiplication subsystems. This paper reports 
the design methodology and the detailed implementa- 
tion of one such structure. Ease of diagnosis and capa- 
bility of reconfiguration were used as twin requirements 
in the final design. When the unit is composed of a 
number of modules and a malfunction is detected in 
one of them, a method of switching automatically in 
a spare module was presented. An estimate of the 
logic circuitry in the hard core (that portion of the 
unit which must be operating without any faults) 
during testing is found to be less: that 14 percent for 
a 32 X 32 module, 9.7 percent ijor 64 X 64 module 
and 4 percent for 128 X 128 module. Therefore, 
as the size of the multiplication module-unit increases 
the relative size of the hard core decreases very rapidly. 

To conclude, the cellular array implementation of an 
asynchronous multiplication unit: using mostly non- 
carry-propagating Carry Save adders was accomplished. 
The final cell design and the control and the recon- 
figuring circuitry are quite simple. 

A number of additional studies needs to be done in 
the future. The design of self-diagnosable and repairable 


functional arrays appear quite feasible and worth 
considering. The possibility of composite design of 
a multiplication, division and square rooting unit using 
techniques presented in this paper could be veiry use- 
ful, particularly if the division and square root al- 
gorithms are based on the availability of fast multi- 
plication units such as those discussed in this paper. 
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The Pad Relocation technique for 
interconnecting LSI arrays of imperfect 
yield 

by D. F. CALHOUN 

Hughes Aircraft Company 
Culver City, California 


INTRODUCTION 

The interconnection of circuits required in Large Scale 
Integration (LSI) using multi-level metalization above 
monolithic semiconductor arrays is taking basically 
two approaches. One is predicated on processing with 
a reasonable yield entire arrays without any semicon- 
ductor defects (i.e., 100 percent yield chips) which 
allows once-generated fixed-wiring patterns to obtain 
the required interconnect. The second approach aims 
at much larger semiconductor Ways (i.e., full-slice 
LSI) for which defect-free processing cannot be ex- 
pected. Thus, probe tests are made of the semicon- 
ductor circuits processed on each LSI slice (or wafer) 
and record is made of the good and bad circuit posi- 
tions. Unique interconnection masks are then generated 
to interconnect good circuits in each wafer’s particular 
yield pattern using certain “discretion” in avoiding 
the bad circuits. As a result, the 100 percent yield 
approach emphasizes the need to use standard inter- 
connect masks but is complexity limited by the oc- 
currence of defective circuits in larger arrays, whereas 
approaches capable of routing around the defective 
circuits have required a full set of unique signal inter- 
connect masks for each wafer’s particular yield pattern. 

The Pad Relocation approach, however, allows the 
interconnection of full-slice LSI arrays containing de- 
fective circuits to be accomplished with a minimal 
amount of unique interconnect per array. Only a 
portion of one of the typically three interconnect levels 
varies from array to array, thus allowing significant 


improvements in the cost, reliability, and testability 
of the finished arrays as well as less limitation on cell 
yields and array complexities. 

Description of the Pad Relocation technique 

Pad Relocation is a technique which allows a pre- 
determined standard pattern of good circuits to be 
established on all LSI slices used to perform the same 
array function regardless of the varying yield patterns 
determined by DC wafer probe tests. This is accom- 
plished by relocating the pads of nearby good circuits 
to the positions where good circuits were specified 
by a prescribed master pattern, but were not- found 
during wafer probe tests. The pad positions above a 
bad circuit (or any unused circuit) are isolated from 
that circuit by a layer of dielectric. Where good cir- 
cuits are found in expected good circuit locations, 
those circuits are used without relocation. Thus, the 
Pad Relocation technique functionally establishes a 
specified pattern of good circuits as if there had actually 
been a 100 percent circuit yield in that pattern. A 
single wiring pattern can then be generated for all 
the LSI arrays of the same function to accomplish the 
much more complex signal interconnect between the 
master pattern circuits. By determining standard 
cross-under areas within the Pad Relocation layer 
where relocation lines need never occur, it has been 
shown that large arrays can be interconnected with 
the same number of total interconnect layers as re- 
quired by discretionary techniques. 
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With each wafer’s good circuits located in the pre- 
determined master pattern, an optimal standard 
interconnect of the circuits can be made for each 
wafer. Since this signal routing and mask-making 
expense is incurred only once for each function, much 
more effort can be spent optimizing the signal routing. 
As a result, the total number of interconnect levels 
(including Pad Relocation) may actually be fewer 
(for very complex arrays) than other techniques by 
which the interconnect is generated for each wafer’s 
particular yield pattern. 

The Pad Relocation technique has been 100 per- 
cent successful for all integrated circuit and special 
LSI wafers considered so far. Tljie “master pattern” 
gives the prescribed locations 6f good circuits to 
which each LSI array’s particular yield will be tailored. 
Statistically, if M is the percentage of wafer circuits 
in the master pattern and Y is the wafer circuit yield 
from probe tests, then only M(l00-Y)/100 percent 
of all wafer circuits need to be relocated. For example, 
if Y = 35 percent and M = 30 percent, then the 
relocation (as a statistical average) of 19.5 percent 
of the wafer circuits will establish a master pattern 
that uses 86 percent of all the good wafer circuits. 
This would allow 120 good circuits to be located in 
prescribed positions, leaving an average of only 20 
good circuits unused. 

An example 

The methodology of the Pad Relocation technique 
is best described by example. Figure 1 shows the map- 
ping of circuits on an LSI wafer. Each dot represents 
the position of a semiconductor cell such as a full 
adder, or a quad two-input NAND gate cell, or a flip- 
flop, etc. Figure 2 identifies with a slash (/) the loca- 
tion of all circuits determined to be good by dc wafer 
probe tests on a particular slice. : The yield of wafer 
circuits varies from 10 percent to 90 percent depending 
on the circuit complexity, and the locations of the 
good circuits cannot be predicted from wafer to wafer. 
This makes it impossible to use; standard intercon- 
nect patterns without first transforming the various 
wafer yielcj patterns to a singly standard pattern. 
The circuit yield (the percent of total circuits which 
are good) for the wafer in Figure % is nearly 30 percent 
and yet there is not a single area 'of 100 percent yield 
that is larger than three circuits by two circuits. Thus, 
100 percent yield could obtain uriits with only about 
5 percent of the complexity allowed by full-slice inter- 
connection techniques. The goal i;s to tailor by some 
efficient means the locations of the good circuits in 
Figure 2 to a standard pattern that may be used for 



Figure 1— -Integrated circuit wafer 
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Figure 2 — Wafer after test — Slashes show good circuit 
positions 


all wafers with about the same circuit yield. For higher 
yield wafers, there are other standard patterns which 
use more good circuits. 

Figure 3 shows a master pattern (in heavy dots) 
which can be used for wafers having at least a 25 per- 
cent yield. That pattern is characterized by a more 
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Figure 3 — A master pattern of good circuits — All wafers 
will be matched to this pattern by the Pad 
Relocation technique 



Figure 4 — Master pattern superimposed on the particular 
yield of the Figure 2 wafer 


dense usage of good circuits toward the center of the 
wafer with good circuit positions never adjoined on 
more than one side by another circuit in the master 
pattern. The latter characteristic facilitates the routing 
of standard signal interconnect as well as the reloca- 
tion of circuits in at least three directions. The matching 
of the master pattern to the expected yield distri- 
bution as a function of distance from the wafer center 
optimizes the conflicting goals of minimum number of 
relocations and maximum probability of fulfilling the 
master pattern. 

Figure 4 shows the Figure 3 master pattern super- 
imposed on the particular wafer yield of Figure 2. 
The objective now is to route a nearby good circuit, 
shown by a slash, to each heavy dot (i.e., master pat- 
tern position) which initially is without a good cir- 
cuit. This specification can be completed manually 
giving a coding sheet description of necessary circuit 
relocations; or a simple computer routing program can 
output a punched tape or cards that can be used to 
make a mask automatically. The computer routine for 
Pad Relocation will use about two orders of magni- 
tude less run time than a customized signal routing 
primarily because no circuit placement or logic signal 
routing are required. Pad Relocation requires only 
that a good circuit be identified for relocation to each 
position in the master pattern which did not initially 
have a good circuit. A later paper will present work 
that is under way to automate the Pad Relocation 


selection and specification with the use of interactive 
graphics. 

Figure 5 shows a manually generated specification 


AREA A 



Figure 5 — Specification of a set of relocations necessary 
to completely implement the master pattern of 
Figure 3 
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of possible relocations that corfipletely satisfies the 
master pattern of Figure 3, uspig the good circuit 
positions of the wafer in Figure! 2. The longest relo- 
cation line length is less than j0.45 inch. Figure 6 
shows how the relocation in area A of Figure 5 can be 
accomplished without crossovers for a quad two-input 
gate cell. Each gate of the bad circuit at the lower left 
is functionally replaced with a good gate from the top 
right circuit. It should be notecf that the computer 
needs only subroutines for leaving (or entering) a cell 
from the top, bottom, left, and right, for moving paral- 
lel lines over some number of cjells, and for making 
ninety degree turns in order to do all the possible Pad 
Relocation routing patterns. Figure 7 shows the actual 
Pad Relocation of an SN5480 gated full adder above 
a silicon wafer using 0.002 inch aluminum lines on 
0.0035 inch centers. Figure 8 shjows how simple the 
Pad Relocation mask is if it is considered as a set of 
the above mentioned subroutines. 

Intermediate step to full wafer LSI 

Figure 9 shows an intermediate step to full- wafer 
LSI using the Pad Relocation technique. Three 4-bit 
Modular Multiplier modules are to be fabricated from 
the three bordered half-inch squajre areas (as was sug- 
gested in a 1968 FJCC paper by D. F. Calhoun). 
Within the three bordered areas, slashes again repre- 
sent good circuits and circles show the master pattern 



i ; * 


Figure 6 — A set of pad relocations pecessary to replace 
functionally the quad two-inpdt gate circuit in 
area A of Figure 5 



Figure 7 — Pad Relocation of an SN5480 gated fall 
adder above a silicon wafer (Using 0.002-inch 
aluminum lines on 0.0035-inch centers) 


locations. The lines terminating in arrowheads show 
how three, eight, and five good circuits can be relo- 
cated into the positions circled to establish the same 
pattern of good circuits for each module, thus allowing 
the use of one standard signal interconnect pattern 
for all subsequent modules tailored to that pattern. 

Figure 10 demonstrates the simplicity of a coding 
sheet specification of the necessary circuit relocations 



Figure 8 — Mask pattern for the pad relocations specified 
in Figure 5 
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Figure 9 — Pad Relocation routing for three 200-gate 
modules on a single 1-J^inch wafer 



Figure 11 — Four relocation patterns for SN5480’s 


for the three multipliers of Figure 9. Figure 11 shows 
the four possible Pad Relocation interconnect patterns 
which are necessary for the LSI multipliers. For these 
modules it seems appropriate to incorporate simple 


signal cross-under lines and power distribution in 
the Pad Relocation level so as to require only two 
additional levels of interconnect above the tested 
LSI chips. 
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Figure 10 — Coding sheet specification 


A Pad Relocation LSI hardware program 

An LSI hardware development program began in 
January 1969 (in which Hughes Aircraft Company 
contracted Texas Instruments to do the multi-level 
processing) and which resulted in fully tested and 
packaged 207 gate arrays in May 1969. During this 
program, (1) TI fabricated and tested one type of 
their LSI wafers having a certain mix of gates and 
flip-flops, (2) TI supplied the yield information on 
each wafer to be processed for Hughes, (3) Hughes 
generated both the one standard signal interconnect 
mask for all wafers as well as an individual Pad Reloca- 
tion mask for each wafer, and (4) using the mask speci- 
fications from Hughes, TI processed the two additional 
levels of interconnect and tested and packaged each 
of the finished units. Similar programs for higher 
complexity arrays have since been initiated. The 
results of this program are described below. 

The logic array to be built ill LSI 

Investigations were made three years ago at Hughes 
Aircraft Company into the application of LSI arrays 
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to techniques for doing the venf high speed sum-of- 
products computations required j in advanced digital 
filtering systems. A result of this study was the de- 
velopment of the high speed "Modular Carry Advance 
Multiplier” which was described; in a 1968 Fall Joint 
Computer Conference paper by IX F. Calhoun. Among 
its characteristics is its modularity which allows 
longer wordlength multiplications to be efficiently ac- 
complished (in terms of speed a,nd parts) simply by 
paralleling more of the identical modules. A 5-bit 
sign-and-magnitude Modular Multiplier designed with 
four types of logic gates and a |K flip-flop was thus 
chosen as the vehicle for LSI development on this 
program. Such an array forms arid, stores in a register 
the 9-bit sign-and-magnitude product of two 5-bit 
operands. The 5-bit multiplier design uses 153 NAND 
gates and 9 flip-flops (each equivalent to six NAND 
gates) for a total of 207 interconnected gates per LSI 
wafer. 

The logical interconnection of 207 gates using less 
than one square inch of an LSI jvafer represents well 
any state-of-the-art bipolar LSI Approach. Two levels 
of interconnect (including the Pad Relocation) were 
used above the tested wafer which already had a first 
level of metalization for component interconnect. 
In terms of cross-over complexity, signal linelengths, 
and circuit fan-outs, the Modular Multiplier design 
can be considered typical of a 200 gate logic array. 


Description of the chosen LSI slice 

The chosen semiconductor slice for this LSI develop- 

struments type “K” 
iploar array of tran- 
es and flip-flops oc- 
1.1 square inches. A 
picture of this LSI wafer is shown in Figure 12. The 
array is subdivided into 298 cells of dimension 0.084 
inch by 0.044 inch. Of the 298 tlasic wafer cells, 170 
are split into two 42 by 44 mil half-cells for gates while 
the 128 JK flip-flops on the wafer occupy full 84 by 
44 mil cells. The distribution of logic elements on the 
K slice is shown in Figure 13. Each cell labeled “3” 
has two independent three-input NAND gates while 
the adjacent cells labeled "5” have an independent 
five-input NAND gate and a on^-input NAND gate. 
In three of the rows of gates k single seven-input 
NAND gate designated by a "7” V^as processed instead 
of two three-input NAND gates. The rows of full- 
sized 84 by 44 mil cells contain the JK flip-flops, which 
are labeled “FF”. In total there! are 642 logic gates 
(170 ones, 264 threes, 170 fives, and 38 sevens) and 
128 JK flip-flops processed on the wafer. 


ment program was the Texas In 
slice. Basically, the K slice is a b 
sistor-transistor logic (TTL) gat 
cupvine: an active area of about 



Figure 12— Texas Instruments LSI type “K” slice 
(HAC Photo 4R07185; 
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Figure 13 — LSI array slice “K” 
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Selection of the master pattern and 
pad relocation patterns 

First, a master pattern of circuits was chosen to 
define the standard circuit positions on the K slice 
that would be interconnected to form the Modular 
Multiplier function. This master pattern (shown in 
Figure 14) was defined with respect to (1) maximizing 
the probability of successful fulfillment, P r (M), of 
the master pattern, (2) facilitating the standard signal 
interconnect, and (3) using a minimum number of 
relocation patterns efficiently. After the master pat- 
tern and the repertoire of relocation patterns to be 
used were determined, restricted areas in the Pad 
Relocation level were defined to allow signal cross- 
unders from the standard top level signal intercon- 
nect. Sufficient cross-under capability for this design 
was found in the flip-flop cells alone by using certain 
areas of these cells which are not required by any of 
the defined relocation patterns. Other cross-under 
areas can be defined for any more complex designs 
so as to still use only two metalization layers above 
the tested circuits. A set of Pad Relocation patterns 
was prepared to allow the efficient selection of the 


Master Pattern Cell Designation Key: 

A = 1 input gate 
0=3 input gates 
□ = 4 input gates 
□ = JK flip-flop 
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Figure 14— Pad Relocation worksheet with master 
pattern locations shown 


particular patterns and their positions necessary to 
fulfill each wafer's master pattern. The chosen set 
of K slice relocation patterns is shown in Figure 15. 
This semiautomated specification has facilitated a 
very fast turnaround and low cost capability for the 
generation of Pad Relocation masks and for working 
with new routing requirements, wafer layouts, and 
logic designs. 

LSI program results 

The end results of the Hughes effort described in 
this section were the two metalization mask specifi- 
cations used by TI to process each wafer. Only one of 
these is unique since the use of Pad Relocation allows 
all signal interconnect to be obtained from a once- 
generated standard mask. Figure 14 shows the work- 
sheet specification of how the yield of a typical LSI 
slice can be tailored to the chosen master pattern. 
The lines with arrow-heads at the end specify reloca- 
tion patterns from the set of patterns shown in Figure 
15. The completion of the K slice master pattern was 
accomplished successfully on each of the 30 wafers 
attempted. A typical time for a man to complete and 
verify the specification shown in Figure 14 was two 
minutes manually. 

From the specifications like those in Figure 14, the 
necessary relocation patterns were selected from the 
standard set shown in Figure 15 and were added to 
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the standard cross-under pattern to complete the Pad 
Relocation mask such as the one shown in Figure 16. 
Only the particular circuit relocation patterns vary 
within this mask which allows the least possible varia- 
tion of interconnect and testing from one array to 
another. The more complex but standard mask is the 
one shown in Figure 17 which accomplishes all neces- 
sary signal interconnect (except the cross-unders to 
the Pad Relocation level) and the power distribution 
for the 5-bit multiplier design. The design for this 
mask can efficiently be done manually for arrays of 
this and larger size since the master pattern is well 
distributed. In mask plotting time alone, the Pad 
Relocation mask required only about 20 percent the 
time required to plot the signal interconnect metali- 
zation patterns. A photograph of the final 207 gate 
LSI multiplier is shown in Figure 18. 

Statistics of Pad Relocation master patterns 

The choice of a master pattern for Pad Relocation 
is important since its definition affects the average 
number of relocated circuits (and thus the routing 
time and mask complexity) as well as the number and 
simplicity of the signal interconnect levels. Also a good 
statistical match between the master pattern and the 
expected wafer yield distribution will result in a higher 



Figure 17 — -5-bit* modular multiplier standard inter- 
connect mask 


probability of successful relocation. As an example, 
consider a master pattern that is defined too densely 
about a wafer’s periphery. Since peripheral wafer 
circuits show a much lower yield than the more central 
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Figure 16 — Pad relocation mask with standard cross- 
unders 


Figure 18—207 gate multiplier LSI array using Pad 
Relocation (HAC Photo 4R09152) 
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ones, there will statistically be more relocations, longer 
relocation lengths, more difficulty in satisfying the 
master pattern, and a higher concentration of signal 
interconnect above the master pattern than if the 
master pattern had been chosen to match the “ex- 
pected” yield distribution as was done for the example 
shown in Figure 3. 

A first question that must be answered is what is the 
“expected” yield distribution? Investigations thus far 
have pointed out only that there is a significant decrease 
in yield as a function of the distance from the wafer 
center which can be attributed to boundary defects, 
and that when good or bad circuits occur, there is a 
more than random clustering effect. No ability to 
predict the locations of these clusters has been obtained. 
What must be done is to examine the yield of large 
samples of the wafer types that will be used to de- 
termine the distribution that best describes their 
expected yield patterns. This distribution will be dif- 
ferent for different ranges of yield as well as for different 
circuit complexities and wafer types. The master pat- 
tern for a specific range of yield, wafer type, and wafer 
size should be matched to the expected distribution 
so as to take advantage of any knowledge of where 
good circuits are more probable. By so doing, the 
probability of successfully fulfilling a master pattern 
is maximized while minimizing the expected length of 
the longest relocations. 

Statistical techniques have been developed to de- 
termine and compare the efficiency of various master 
patterns in terms of maximizing both the utilization 
of good circuits and the probability of successfully 
fulfilling the master pattern. For example, if y is the 
percentage of the total circuits that were found to be 
good (i.e., the yield), m the percentage of total cir- 
cuits that are in the master pattern, and r the number 
of unused circuits from which a relocation could be 
made to each master pattern circuit, then the proba- 
bility of successfully fulfilling each master pattern 
circuit independently is: 


P(l) = y + (1 - y)y + (1 - y) 2 y + • • • 

+ d-y)'y = y&i-y)‘ (D 

A-0 


where the first term is the probability that the master 
pattern circuit itself is good, and each succeeding term 
is the conditional probability of needing to examine 
another candidate for relocation times its probability 
of being good. Equation (1) can be simplified as follows: 


y‘f(i-y)‘ = yl^ y_1)(1 ~ y)( 
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= -(u' +1 — 1) = 1 -(1 — y) r+1 (3) 


therefore, 


P(l) = 1 - (1 - y ) r+1 


( 4 ) 


If the master pattern has a total of M circuits in 
it, then the joint probability of successfully fulfilling 
all of the M circuits becomes: 

P(M) = P(1) M = [1 - (1 - y)] r+lM (5) 

Equation (5) is based on an uncorrelated and pseudo- 
random distribution of good circuits (see Reference 10 
with Y>0.25) as well as the same assumption as 
Equation (1) that there are r circuits (good or bad) for 
each master pattern circuit fsom which a relocation can 
be made independently of the other master pattern 
circuits. It is, however, an unnecessary restriction to 
assign r circuit positions which could only be used to 
fulfill each master pattern circuit. Instead, consider 
successively examining up to r circuit positions which 
are the closest to each particular master pattern position 
and, for which, there is still a free path in the Pad. 
Relocation level to the master pattern position. Then 
Equation (5) will give the probability of successfully 
relocating (if necessary) to each of the M required 
master pattern positions at least one of the r closest and 
free circuit positions. 

Equation (5) determines a family of curves for 
P r (M) versus M for various yields and values of t. 
Figure 19 shows the curves of P r (^) versus M with 
y = 0.5 for r — 4 and r = 9. It should be noted that 
each circuit of M may actually be many interconnected 
gates of logic and M = 100 would represent 1000 gates 
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Figure 19— The probahilty P r (M) of successfully 
fulfilling a master pattern of M circuits by relocating from 
one of up to r nearby circuits. Eacjh circuit is a tested unit 
which may have many gates of logic complexity 

if each circuit of M had 10 gates of equivalent logic 
complexity. If it is desired to successfully fulfill the 
master patterns of at least half the wafers considered, 
Figure 19 shows that 220 circuits (and thus probably 
750 or more gates) can be used if r = 4, and 680 cir- 
cuits can be used if r = 9. Of Course, any wafers for 
which the master pattern was hot easily fulfilled are 
not lost since they can be inventoried and used for 
other master patterns, or for integrated circuits, or 
diced and bonded to substrates! As a comparison the 
most complex current bipolar discretionary unit has 
an equivalent M of 169 while the 100 percent yield 
approach has reached an equivalent M of only 24. 

Advantage of Pad Relocation to IjSI 

signal interconnect 

The prime advantage of Pad: Relocation LSI which 
has been described above is thajt it places the pads of 
all used circuits in standard positions which both al- 
lows fixed-pattern signal routing between these cir- 
cuits as well as the utilization of more circuits than 
allowed by other LSI techniques. There are further 
advantages, however, to the routing of the standard 
signal interconnect. For exaniple, the positions to 
which circuit pads will always be brought can be modi- 
fied and optimized to facilitate the necessary routing 
of signals as well as to minimize the lengths of the 
longest or the most critical signal paths. This will also 


allow the standard signal interconnect to be designed 
to require the minimum number of levels and the 
minimum area per level. Thus, chip areas can be less 
interconnect limited. 

Improvement of testing and reliability of large 

scale integrated systems 

Semiconductor device reliability, as well as propa- 
gation delay, is highly dependent on proper main- 
tenance of junction temperatures within certain 
bounds. From the maximum specified junction tem- 
perature, a maximum power dissipation per wafer 
area can be computed which is dependent on the heat 
conductive characteristics of the wafer and the cooling 
techniques used, as well as on the area and power dis- 
sipation of the particular circuits. Thus there will be 
a maximum number of circuits that should be powered 
up on the wafer. In addition, no region of the wafer 
should exceed a certain maximum power density in 
order to insure that the wafer will not have relative 
“hot spots” where too many powered circuits are lo- 
cated. Pad Relocation LSI can help insure that the 
wafer power dissipation density is not excessive by 
specifying the relocated circuits to be primarily those 
from areas of sparce circuit utilization, thus obtaining 
a more uniform power density across the entire wafer. 
By so doing, the system cooling requirements can be 
relaxed and/or more circuits can be used on the same 
wafer. This more uniform power dissipation could be 
quite difficult to insure with other routing techniques 
since there is less choice in the used circuit positioning. 
A simple means by which a Pad Relocation computer 
program could insure a uniform power density would 
be to either count the number of powered circuits in 
various wafer regions as the Pad Relocations were 
being assigned or to assign all Pad Relocations, com- 
pute local power densities, and then reassign any 
necessary Pad Relocations to meet the maximum local 
power density. 

A most significant advantage to Pad Flelocation 
LSI is that test pads can very easily be placed in 
standard positions in the top layer of wafer metali- 
zation. Since they are in standard positions, these 
test pads can readily be used to facilitate automated 
probe testing of interconnected wafers j ust prior 
to final encapsulation without requiring a large num- 
ber of additional package leads. Especially for se- 
quential arrays this will be important since it will 
both allow the pre-setting of the flip-flops to known 
states and the monitoring of their outputs w thout 
add'ng package leads. It is well known in testing 
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theory that only by having control over the states of 
flip-flops can it be guaranteed that combinatorial-like 
tests will be found for a logic array, if they exist. Thus, 
the ability to define standard test pads will allow both 
automated probe testing as well as making the defini- 
tion and execution of the required test sequences 
simpler. 

The definition of standard test pads has further 
applicability to systems partitioning, improvement 
of effective processing yield, fault diagnosis, and the 
testing of redundant networks. 
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A consideration of the application of 
cryptographic techniques to data 
processing 

by R. 0. SKATRUD 

IBM Corporation 

Research Triangle Park, North Carolina 


INTRODUCTION 

Two digital cryptographic techniques are described 
which may have potential applications in Data Pro- 
cessing Systems. A method of digital substitution 
analogous to a Vernan double tape system is presented, 
using a controlled combination of data and the contents 
of two memories. The second method uses a digital route 
transposition matrix using a combination of row and 
column transposition under memory control. Possible 
ways of achieving key leverage in each ciphering process 
are described. 

The large growth in digital computers and computer 
usage proliferating to time-shared remote systems 
presents an increasing need to provide data security 
within a system as well as applying it to data transmitted 
over communications media. 1 Two fundamental ap- 
proaches to producing security in data use are developed 
in this presentation. One is a digital-substitution 
technique and the second involves a digital-matrix 
transposition. 

Some of the earliest practical cryptographic systems 
were the monoalphabetic substitution systems used by 
the Romans. 2 In these, one letter is substituted for 
another. F or example, an A might be replaced by a C. 
By the fifteenth century, an Italian by the name of 
Alberti came up with a technique of cryptoanalyzing 
letters by frequency analyses. As a result, he invented 
probably the first polyalphabetic substitution system 
using a cipher disk. Thus, he would encode several 


words with one substitution alphabet, then he would 
rotate the disk and encode several more words with the 
next substitution alphabet. 

Early in the sixteenth century Trithemius, a Bene- 
dictine Monk, had the first printed book published on 
cryptology. Trithemius described the square table or 
tableau which was the first known instance of a pro- 
gressive key applied to polyalphabetic substitution. It 
provided a means of changing alphabets with each 
character. Later in the sixteenth century, Vigenere 
perfected the autokey: a progressive key in which the 
last decoded character led you to the next substitution 
alphabet in a polyalphabetic key. These were basically 
the techniques that were widely applied in the crypto- 
machines in the first half of the twentieth century. 
Various transposition techniques have been employed 
including the wide use of changing word order and 
techniques such as rail transcriptions (used in the 
Civil War). 

In 1883, Auguste Kerckhoffs, a man born in Holland 
but a naturalized Frenchman, published a book entitled 
La Cryptographic Militaire. In it, he established two 
general principles for cryptographic systems. They were: 

1 . A key must withstand the operational strains of 
heavy traffic. It must be assumed that the 
enemy has the general system. Therefore, the 
security of the system must rest with the key. 

2. Only cryptoanalysts can know the security 
of the key. In this, he infers that anyone who pro- 
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poses a cryptographic technique should be 
familiar with the techniques that could be used to 
break it. 

From these two general principles, six specific 
requirements emerged in his book;: 

1. The key should be, if not theoretically unbreak- 
able, at least unbreakable in practice. 

2. Compromise of the hardware system or coding 
technique should not result in compromising the 
security of communications that the system 
carries. 

3. The key should be remembered without notes 
and should be easily changeable. 

4. The cryptograms must be transmittable by 
telegraph. Today this would be expanded to 
include both digital intelligence and voice (if 
voice scramblers are employed) utilizing either 
wire or radio as the medium. 

5. The apparatus or documents should be portable 
and operable by a single person. This require- 
ment is met in the systems proposed in this paper 
by the portability of the key in a dense storage 
medium (such as magnetic tape), installable in a 
processing system by one man. 

6. The system should be easy, neither requiring 
knowledge of a long list of rules nor involving 
mental strain. In the proposed systems, the key 
is an automatic-machine-cdntrolled process until 
a key change occurs. 

In 1917 Gilbert S. Vernan, a young engineer at 
American Telephone and Telegraph Company, using 
the Baudot code (teletype) invented a means of adding 
two characters (exclusive or). Vernan’ s machine mixed a 
key with text as illustrated by the following: 

Clear Text 1 0 1 1 1 

Key 0 10 10 


Coded Character 1 11 0 1 

To derive the text from the coded character, all that was 
required was the addition of the key again to the coded 
character. 

Coded Character 1 11 0 1 

Key 0 10 1 0 


Clear Text 1 0 1 1 1 

His machines used a key tape loop about eight feet long 
which caused the key to repeat itself over a high volume 
of traffic. This allowed cryptoanalysts to derive the key. 


William F. Friedman, in fact, solved cryptograms using 
single-loop code tapes but appears to have been 
unsuccessful when two code tapes were used. Major 
Joseph O. Mauborgne (U. S. Army) then introduced the 
one-time code tape derived from a random noise source. 
This was one of the first theoretically (and in practice) 
unbreakable code systems. The major disadvantage of 
the system was the enormous amounts of key required 
for high -volume traffic. 

During the 1920’s and 1930’s, the rotor-code machines 
having five and more rotors, each rotor representing a 
scrambling step, were developed. They proved relatively 
insecure, requiring only high-traffic volume for the 
cryptoanalyst to break them. In fact, the Japanese used 
a code-wheel-type machine for their diplomatic com- 
munications well into World War II. It was vulnerable 
to cryptoanalysis, and William F. Friedman and his 
group not only solved the code but reconstructed a 
model of the machine to break Japanese diplomatic 
correspondence. Thus, President Roosevelt and others 
were aware of the impending break in diplomatic 
relations with Japan just prior to World War II. 

The code wheels (or rotors) were nothing more than 
key memories storing quantities of key which could 
easily be changed by interchanging rotor positions, 
specifying various start points for each rotor, and 
periodically replacing a set of rotors. This provided a 
means of producing what I will call key leverage. 

Digital substitution 

A system which uses the “exclusive or” technique 
developed by Gilbert S. Vernan, applied directly to data 
stored and distributed by a computer, is shown in 
Figure 1. 

Instead of using two tapes, this system would use two 
key memories and an address memory. Synchronization 
would be achieved by use of the address memory which 
would be addressed by the first transmitted intelligence. 
The contents of the two addresses obtained could come 
into the address registers which would pull key words 
from the associated addresses in each of the two key 
memories. Data to be transmitted would be first 
exclusive ORed with the contents of the first memory 
location and then with the contents of an address of the 
second memory. Each character transmitted would thus 
be encoded twice. 

This would represent an element of security depend- 
ent on the contents of the two key memories. Order-of- 
address usage of the key would be dependent on the 
contents of the address memory. To derive the key, 
contents of the key memories and address memory 
would have to be solved. The larger the memory 
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Figure 1 — Digital substitution logic 


contents, the harder these would be to determine. 
A large volume of traffic, where starting points in the 
address control memory would be repeated, could begin 
to provide clues that could be used to derive the key. 
Therefore, one would, at frequent intervals determined 
by usage, change the content of the address memory. 

At less frequent intervals, one would change the 
contents of the key memories. These intervals would be 
chosen again on the basis of data traffic using the 
system and the type of security expected from the 
system. 

The relative security of the system would be a 
function of the amount of memory. If a memory of 
n bits is considered, total permutations available in the 
memory of those bits would be 2 n . If the key is derived 
from a random noise source, probabilities of getting all 
o's or all Fs in the memory are very small, as wpuld 
large imbalances existing between o’s and Fs. Therefore, 
each key memory would have a distribution in the total 
bit field available approximating a distribution of bits 
whose permutations in practice would be more in the 
neighborhood of 2 n/2 . Each of the two key memories 
would have one of that many practically usable 
permutations, each one of which could operate on the 
other in the encoding process. Therefore, by probability 
theory, the probable permutations would be the product 
of the two memorypo tentials or (2 n/2 ) ( 2 n/ 2 ) or a poten- 
tial key field of 2 n permutations. 


The 2 n possible permutations of the key memories 
would also be acted on by the m addresses of each 
memory which would all exist in any order in the 
address memory. Possible permutations pf addresses, 
taking them m at a time, would be m factorial for each 
of the memories. Therefore, one would achieve the 
possibility of each of the m-f&ctorial, possible addresses 
for one memory being able to operate on each of the 
m-factorial, possible permutations of the other memory. 
This would represent a total of (m!) 2 possible permuta- 
tions of the addresses. 8 

Therefore, if one were to completely break the key, 
one would have to derive the one permutation used out 
of a potential of a total possible equal to (m!) 2 2 n . Heavy 
traffic on the system, with repetition of the key, would 
however, give handles to the cryptoanalyst in deriving 
the key so it could not be considered unbreakable. 

It is possible to achieve a system which would be 
unbreakable in theory and achievable without using 
great amounts of key. This is achievable by using a 
one-time key with techniques of producing key leverage. 
Since two memories are used for key, and each memory 
has addresses associated with it in the address memory, 
one can achieve key leverage by the fact that different 
combinations of the contents of the addresses of the key 
give different coding results. Proper choice of address 
usage in the address memory will insure that each 
message that is transmitted would be encoded with a 
unique code until all the combinations of the addresses 
were used for the two key memories. 

It is known that the Address Memory contains one 
of m! permutations possible in the m addresses for each 
key memory. If it is assumed that each memory location 
contains a character in key memory, that somewhere in 
the address memory is the address of that character, 
and that each address is one address memory location, 
then a practical means of control begins to emerge. If, 
for example, m is considered to be 1,000 addresses and a 
usage scheme is used similar to that outlined in Table I, 
synchronization would be achieved by message number- 
ing consecutively from 000 to 999. 

The first character transmitted would use the 
contents of address 000 for Key Memory 1 and 000 for 
Key Memory 2. The second character transmitted 
would use the contents of address 001 in Address 
Memory 1, and the contents of address 001 in Address 
Memory 2. This progression could continue to address- 
memory-location 999 for each of the two address- 
memory slots. 

The second message transmitted would be numbered 
001. The address-register pairs for message number 2 
would now be 000, and 001 for the first character. The 
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TABLE I- 

-Address memory usage 



Address 

Address 

Message Number 

Memory 1 

Memory 2 

000 

000 

000 


001 

001 


002 

002 


f 

J 


999 

999 

001 

000 

001 


001 

002 


002 

003 


J 

I 


999 

000 

002 

000 

002 


001 

003 


J 

J 


999 

001 

J 

J 

J 

500 

000 

500 


001 

501 


J 

f 


999 

499 

999 

000 

999 


001 

000 


002 

001 


J 

J 


999 

998 


Refresh Address Memory and repeat cycie. 


second character would be 001 and 002. The address 
register would therefore be using different address pairs 
for the second message than it did on the first. 

At the 501st message, it would bear number 500. The 
address pairs for the first character transmitted would 
now be 000 and 500. The second character transmitted 
would use address-memory locations 001 and 501. 

Therefore, it can be seen that by continuing the 
sequence through message 1000 bearing number 999, no 
repetition of address pairs will e^ist. Therefore, with m 
equal to 1,000 and two key memories and 2 address 
memories, the system limit — if used in this way — is 

1,000 messages of 1,000 characters each. 

At the time the system limit is reached, one would 
change the address memory by supplying a new 


permutation of addresses for each of the two address- 
memory slots. This would provide the capability of 
transmitting and receiving another 1,000 messages of 

1,000 characters each. It can be seen that the system 
employs a progressive-key system and, in theory, one 
could use m! combinations of addresses in each of the 
two address- memory slots before obvious key repetition 
would begin, without changing the contents of the key 
memories. In practice however, one would,' at pre- 
determined intervals, change the contents of the key 
memories. 

It can be shown that the system is modular. By add- 
ing a third key memory and a third address memory 
slot, the system would be expanded to 1,000,000 
messages each with a capacity of 1,000 characters. It can 
also be shown that a trade-off exists between message 
length and number. For example, if message length were 
defined to be a maximum of 100 characters instead of 

1,000 the message qount on the expanded system could 
go to 10,000,000 messages before the key would be 
repeated. 

In a system using two levels of encoding and m = 

1,000 at a transmission rate of 2,000 bits per second, the 
key will last for 1.4 hours of continuous transmission 
before the address slots in the address memory would 
have to be changed. This assumes that 10 bits are 
present in each key memory address. If transmission 
loading was 50%, this figure would go to 2.8 hours. 
Therefore, with heavy traffic, the Address Memory 
Contents would have to be changed two or three times 
per day. This could be arranged by pre-storing numbers 
of changes on a dense-stor age medium such as magnetic 
tape. 

Higher usage rates would require higher rates of 
change for the Address Memory and/or a modular 
expansion of the key system. Therefore, the system is 
applicable to any rate of key usage that is in use today. 
It is also modular, as can be seen, by choice of m. 

Thus, it is possible to use a system of digital sub- 
stitution in a cryptographic computer system which 
would, if system design parameters were properly 
chosen, deny access to data in a system to all who did 
not possess the cryptographic key. The system described 
here is basically a polyalphabetic substitution system. 
It employs the fundamental techniques employed by 
Vernan and would also include some of the charac- 
teristics of the rotor machines in achieving leverage in 
the number of permutations available on data. It is, 
however, different since we are now operating on the 
digital makeup of the intelligence rather than on the 
character as an entity, and we use electronics instead of 
the mechanical rotor. We also avoid repetitive use of the 
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key which was the reason that rotor-machine codes were 
finally broken. 


TABLE II — Effect of matrix size on permutations 
Matrix Size: n Read-Write Permutations: (n/) 2 


Digital route transposition 

Transposition techniques can also be used in conjunc- 
tion with data processing. If the route transposition 
technique is applied to the read-in and read-out of 
digital data from a matrix, it is possible to achieve the 
results of polyalphabetic substitution without a direct 
substitution key being required. It can be shown that 
key usage is far less than that required for direct 
substitution. With the data-key leverage obtained, 
some interesting possibilities on key transmission can be 
obtained. With these, it becomes more feasible to 
explore the possibility of single-use keys. 

To illustrate the method, let us consider an n 2 matrix 
where n. = 8. The matrix will be made up of 8 rows and 8 
columns. Information can be read into and out of the 
8 columns of the matrix in any order. 

The information would be transmitted into the 
receiving-matrix columns in the same order that it left 
the transmitting matrix. To complete the data recon- 
struction, the information in the receiving matrix now 
would be read out in the same row order that it entered 
the transmitting matrix. Therefore, the process is 
reversible. 

Figure 2 shows the base matrix. If an 8-by-8 matrix is 
considered, there are 8 factorial different orders possible 
in both the rows and columns. For any one matrix of 
information (64 bits), there are a possible (8!) 2 ways of 
seeing this information when transmitted. 2 Eight fac- 
torial squared gives an approximate 1.6 X 10 9 possible 
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Figure 2 — Digital route transposition matrix 
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permutations on each matrix. Since it would be the 
function of the key to select each matrix permutation, 
each matrix would be transmitted with a different key. 
Table II shows the effect of varying matrix size in terms 
of available permutations on the data. 

The elements of control required for the rows and 
columns of the matrix must be independent. To keep 
control independent, Row Key and Column Key 
Memories can be used. Since in the example chosen there 
are 8 rows and 8 columns, there are 8! different possible 
orders to read into or out of each matrix. Thus 2 16 is 
approximately equal to 8!. It can be shown that 16 bits 
will be sufficient for reading information into or out of 
the rows of the matrix. Likewise, 16 bits will allow 
information to be read into or out of the 8 columns. 
Therefore, 32 bits of key are required to encode and 
decode 64 bits of information using a Digital Route 
Transposition Matrix. 

By looking at direct key usage and comparing it to 
key usage described on Digital Substitution, we find 
that key consumption per transmitted bit is reduced by 
a factor of 4. For each 64 bits transmitted by the 
Digital Route Transposition Matrix, 32 bits of key are 
used. With Digital Substitution 2 bits of key are used 
for each bit transmitted. 

If 1,000 addresses are assumed in each of two address 
slots in Address Memory, the potential for applying 
them to something analogous to message number again 
exists. However, since each step of the Address Memory 
now transmits a full matrix of information, synchroniza- 
tion would now be achieved by matrix count instead of 
message number. Therefore, one would step through the 
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address count. If 1,000 addresses were used in each of 
the row and column memories, one would step through 
the address-register counts 1,000,000 times, pairing up a 
different row and column address count every step. In 
terms of usable bits available for transmission, this 
would yield 64,000,000 data bits. If the system were 
transmitting 2,000 bits per second, this would represent 
8.8 hours of continuous transmission. If transmission 
utilization was 50%, this would represent 17.6 hours of 
transmission. It would be applicable to any data rate by 
varying the address memory-change rate. At the end of 
the 64,000,000 bits, the address memories would be 
refreshed with a new permutation and the process would 
continue. Thus, no key repetition would occur. 

If data rates were very high,, one could consider 
transmitting the address information encoded in the 
one-time key. This could be accomplished by the 
addition of an Address Buffer Memory which would be 
loaded prior to the point where the system ran out of 
address permutations. At the point of run-out, a new 
permutation would be moved out of the Address Buffer 
into the Address Memory and the process would 
continue. Since both ends of a system must be syn- 
chronized, the transfer would always occur simul- 
taneously at both ends of the system. 

After some predetermined number of address-permu- 
tation changes, the key would be changed in the Row 
and Column Key Memories. This change would not 
have to be frequent unless a key compromise was 
suspected. Frequency of change will normally be 
determined by choice of memory size and other design 
parameters in the system. 

Figure 3 represents a diagram of the transmitting 
function for a Digital Route Transposition Matrix. An 
address followed by data would come into the system. 
The starting address for the address memory would be 
also transmitted to the receiving station. The station 
would select a pair of key addresses in the address 
memory which would pull the contents from Row-Key 
and Column-Key memories to activate the row- and 
column-scan selection control. 

The row-scan selection control would activate read-in 
to one matrix of data, which would fill all of its rows 
with data. At that point in time, column-scan selection 
control would take over and begin transmitting data 
from the first filled matrix. While the first matrix is 
transmitting, a new row-key word would be brought out 
by stepping the address-selection logic to get the next 
row key from Row-Key Memory. 

When the first matrix has finished transmitting, 
address-selection would supply the second address for 
the Column-Key Memory. With proper choice of timing 



Figure 3 — Digital route transposition matrix 


relationships, continuous data transmission would occur 
by permitting encoded data from matrix A or B to enter 
the line. While one matrix is transmitting, the second 
would be filling with data. 

The receive function would be the reverse of the 
transmitting function. It can be seen that a double 
matrix is required to secure continuous transmission. 

Error detection and recovery 

Existing techniques of error detection could be 
employed. When techniques of ciphering are used on a 
system employing transmission lines, it would be 
possible to use a polynomial accumulation which comes 
very close to being unique for each block of data 
transmitted. The one- or two-character accumulation 
would be transmitted at the end of the block and 
compared to that generated by the received data. An 
error would be assumed only if there was a difference. 

Error recovery would be initiated by transmittal of a 
negative acknowledgment to that block of data. Action 
at the transmitting end would then consist of retrans- 
mission of the data block with the same address 
designations for key usage being held. This would be 
necessary to prevent the same text from being trans- 
mitted twice with different keys. Transmission twice 
with different keys could provide information to a 
cryptoanalyst which could possibly permit breaking a 
portion of a key. 

Polynomial accumulations would be equally appli- 
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cable to either the double substitution or transposition 
technique. Accumulations could be done either before or 
after encoding, depending on the handling of trans- 
mission control characters. 

Another possible error detection technique, particu- 
larly with a transposition matrix, would be the utiliza- 
tion of horizontal and vertical parity assignments. The 
reliability of the technique would require evaluation for 
the particular application to determine what types of 
multiple-bit errors would result in lack of error detection. 
It would be used only if the probability of detection 
were sufficiently high for the application. Again, 
detection of the error would require re-transmission by 
the same key-address designators that were used for the 
original transmission. 

Both systems would satisfy all the criteria and 
principles established by Auguste KerckhofFs book 
published in 1883. There would be no requirement for 
manual intervention unless maintenance was required. 
Since a one-time code would be used, it is in theory 
unbreakable. The security of the system rests with the 
key, not the hardware. In operation, the key is under- 
going continuous change automatically and/or under 
control of an operator, depending on application and 
specific hardware design. Both systems would be simple, 
and for all practical purposes, transparent to the 


operators. Operators would handle the clear information 
as is done today, even in the most confidential types of 
operations. 

Selection of operators would remain a management 
function, as it is today. This system would be designed 
to prevent unauthorized proliferation of confidential 
information by direct access from other I/O devices that 
do not possess the key, in spite of the fact that they 
may have the hardware. 

With integrated circuits becoming available and the 
cost per circuit function decreasing, it becomes possible 
to consider undertaking designs that would offer 
relatively high degrees of privacy in computer systems 
at reasonable cost. High-density memories and the 
technology to support the logical control of crypto- 
graphic systems exist today. 
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Security controls in the ADEPT-50 
time-sharing system 
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“ Authority intoxicates/ And makes mere 
sots of magistrates” — Butler 

FOREWORD 

At present, the system described in this paper has not 
been approved by the Department of Defense for 
processing classified information. This paper does not 
represent DOD policy regarding industrial application 
of time- or resource-sharing of EDP equipment. 

INTRODUCTION 

Computer-based, resource sharing systems are, and 
contain, things of value; therefore, they should be 
protected. The valuables are the information data 
bases, the processes that manipulate them, and the 
physical plant, equipment, and personnel that form the 
system plexus. An extensive lore is developing on the 
subject of system protection. 1 * 2 Petersen and Turn 8 
discuss in considerable detail the substance of protection 
of non-military information systems in terms of threats 
and countermeasures. Ware 4 - 6 contrasts “security” and 
“privacy” for viewing protection in military s ystems as 
well. This paper describes the security controls imple- 
mented in the ADEPT-50 time-sharing system 6 — a re- 
source sharing system designed to handle sensitive 
information in classified government and military 
facilities.* 

Our approach to security control is based on a set 


* Development of ADEPT was supported in part by the Ad- 
vanced Research Projects Agency of the Department of Defense. 


theoretic model of access rights. This approach appears 
natural, since the important objects of security are sets 
of things — users, terminals, programs, files— and the 
operators of set theory — membership, intersection, 
union — are easily programmed for, and quickly per- 
formed by, computer. The formal model defines 
time-sharing security control of user, terminal, job and 
file security objects in terms of equations of access based 
upon their security profiles — a triplet of Authority, 
Category, and Franchise property sets. The correspond- 
ence of these properties to government and military 
Classification, Compartments, and Need-to-Know is 
demonstrated. Implementation of the model in the 
ADEPT-50 Time-Sharing System is described in detail, 
as are features that transcend the model including 
initialization of the security profiles, the LOGIN 
decision procedure, system integrity checks, security 
residue control, and security audit trails. Other novel 
features of ADEPT security control are detailed and 
include: automatic file classification based upon the 
cumulative security history of referenced files; the 
“security umbrella” of the ADEPT job; and once-only 
passwords. The paper concludes with a recapitulation 
of the goals of ADEPT security control, approximate 
costs of implementation and operation of the security 
controls, and suggested extensions and improvements. 

Historically, protection of a sensitive computer 
facility has been attained by limiting physical access to 
the computer room and shielding the computer complex 
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from electromagnetic radiation. This “sheltered” 
approach promotes one-at-a-time, batch usage of the 
facility. Modern hardware and software technology has 
moved forward to more powerful and cost/effective 
time-shared, multi-access, multiprogrammed systems. 
However, three features of such systems pose a challenge 
to the sheltered mode of protection: (1) concurrent 
multiple users with different access rights operating 
remote from the shielded room; (2) multiple programs 
with different access rights co-resident in memory; and 
(3) multiple files of different data sensitivities simul- 
taneously accessible. These features appear to violate 
traditional methods of accountability based upon a 
single user (or multiple users with like clearances) 
operating within strictly controlled facilities. The 
problem is of such magnitude that no time-sharing 
system has yet been certified for use in the manner 
described! However, some multi-access systems are in 
operation in a classified mode, 7 - 8 and a number of 
design approaches have been suggested.. 9 - 10,11 ’ 12 
In addition to the usual goal of building an effective 
time-sharing system, 18 the ADEPT project began with a 
number of security objectives as well: 

1 . Build a security control mechanism that supports 
heterogeneous levels and types of classifications. 

2. Design the security control mechanism in such a 
manner that it is itself unclassified until primed 
by security configuration parameters, a point 
strongly supported by Baran 14 regarding corn- 
mum* ca tons security. 

3. Construct the security control mechanism as an 
isolated portion of the total time-sharing system 
so that it may be carefully scrutinized for 
correctness, completeness, and reliability. 

4. Do the above in as frugal a manner as possible, 
considering costs to design, fabricate, and 
operate. Good system performance is our prin- 
cipal criterion in selecting among alternative 
technical solutions, as noted by the author 
elsewhere. 16 

In approaching our task, we recognize security as a 
total system problem involving hardware, communica- 
tion, personnel, and software safeguards. However, our 
focus is primarily on monitor software, and its interfaces 
with the other areas. This view is not parochial: our 
hardware is a standard IBM 360 model 50; communica- 
tion security is an established field of study with 
considerable technological know-how; 34 and the policy, 
doctrine, and procedures for personnel behavior in 
classified environments are extensive, with legal founda- 


tions. Thus, our only degree of freedom is the control we 
build into the time-sharing executive software. 

A security control formalism 

A formal model of software security control for access 
to sensitive portions of ADEPT is developed here. 

Security objects 

Four kinds of security objects are to be managed by 
our model: user, terminal, job, and file. Let u denote 
some user; t some terminal; j some job; and / some file. 

Security properties 

Each security object is described by a security profile 
that is an ordered triplet of security properties— Au- 
thority (A), Category (C), and Franchise (F). Authority 
is a set of hierarchically ordered security jurisdictions. 
Category is a set of discrete security jurisdictions. 
Franchise is a set of users licensed with privileged 
security jurisdiction. 

The property “Authority” is defined as a set A, where 

A = {a 0 < a 1 <,•••, < a w } (1) 

and the specific members, a 4 , of the set are security 
jurisdictions hierarchically ordered. 

“Category” is a discrete set of specific compartments, 

C = {c«,cS (2) 

Compartments are mutually exclusive security sanc- 
tuaries with discrete jurisdictions. 

“Franchise” is a security jurisdiction privileged to a 
given set of users, i.e., 

F = {u|u is a user} (3; 

For a given terminal, t , let a given Authority set, A, 
be denoted by A*, or in general, let a given security 
object, a, denote a given property, P, for a as P a . Hence 
we can speak of A*, or Cy, etc., to mean the specific 
Authority set for a given user, u, or the specific Category 
set for a given job, j } respectively. 

Four important sets (of users) arise with respect to 
the Franchise property, namely, Franchise for files, 
terminals, jobs, and users. To distinguish the sense in 
which a given user is being considered, we subscript u 
by the security object under consideration. Hence, u / 
means the user with jurisdiction to file /; u< and uy are 
similarly defined. For completeness, we define u* as 
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simply u. We can now define Franchise for each 
security object. 


F. = 

{u} 


(4) 

F, = 

{u®, u}, • 

■;u?) 

(5) 

Fj = 

{u°, uj, • 


(6) 

F / = 

K, u}, ■ 

■ u?} 

(7) 


Equation (4) states that the Franchise for a user is 
restricted to himself; his jurisdiction is unique, and no 
other user is so endowed. Equation (5) states that the 
terminal Franchise is possessed by X different users who 
have jurisdiction over the terminal t . Likewise, equa- 
tions (6) and (7) define the job and file Franchise sets. 

In security discussions, one hears the familiar phrase, 
“he needs a higher-level clearance.” We can now define 
“higher level” with our model. 

Let a and 0 be security objects and let p be some 
function such that p(A a )eA. 

Then, 


A a > A/3 <-> p( A a ) > p(Ap) 

(8) 

C«>C„<-C a 3C„ 

(9) 

F a > Fp «-» F a 3 Fp 

(10) 


Equation (8) claims that the Authority of a security 
object, A a is at a “higher level” than another security 
object Ap when the specific authority, a a is greater than 
the specific authority, ap. 

It is implicit in equations (1) and (8) that the specific 
authorities, a { , must be numerically encoded for the 
magnitude relationships to hold. Equations (9) and (10) 
define P a to be greater than Pp if and only if P^ is a 
subset of P a . 

Events may alter the membership of property sets. 
Let be the eth P/ in a given context. 

Define the Authority history, Ah, at the eth event as 

A*(0) = a® (11) 

Aa(* 0 = max (A h (e — 1), p(A})), e > 0 (12) 

Likewise, define the Category history C/*, at the eth 
even Pa ^ 

(13) 


C h (e) = C h (e - 1) U C;, e > 0 (14) 

Equations (11) through (14) recursively define two 
useful sets that accumulate a history of file references as 
a function of file reference events, e. A history of the 
highest Authority, A*, is defined by equation (12) as 
either the previous set, A h (e — 1), or the current set, 
p(Ap, whichever is larger in the sense of equation (8). 
Equation (11) gives the initial condition as some low 
specific file authority, a®. Equation (14) defines the 
highest Category history as the union of the previous 
set, C h(e — 1), and the current set, C }; while equation 
(13) states that the union is initially the empty set. 

Though Ffc could be defined in our model, no need is 
seen at this time for a Franchise history. More will be 
said about these history sets later. 

Property determination 

Table I presents in a 3 X 4 matrix a summary of the 
rules for determining the security profile triplets, P a . 
We shall examine these rules here. For the user u } 
A u and C u are given constants, and F w is given by 
equation (4). For the terminal t , A* and C* are given 
constants, and F t is given by equation (5). Given A u and 
At, we determine Ay as: 

Ay = min (A u , At) (15) 

L'kewise, given C w and C*, we determine Cy as: 

Cy = C u D C* (16) 

Equation (6) gives Fy to complete the job security 
profile triplet. 

An existing file has its secuiity profile predetermined 
with A f and C/ as given constants, and F/ as given by 
equation (7). However, a new file — one just created — 
derives its security profile from the job's file access 
history according to the following: 


A / = A h (e) 

(17) 

C/ = Cft(e) 

(18) 

F / = uj 

(19) 


From equations (11) through (14) we see how the 
Authority and Category histories accumulate as a 
function of event e. These events are the specific times 
when files are accessed by a job. To maintain security 


C*(0) = <*> 
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\ Property 

Object n. 

TABLE I — Security property determination matrix 


Authority 

A 

Category 

C 

Franchise 

F 

User, u 

Given Constant 

Given Constant 

u 

Terminal, t 

Given Constant 

Given Constant 

uj 

Job, j 

min(A„, A,) 

.. j 

O 

g 

) 

O 

I 

u} 


Existing file 

Existing file 


File, f 

Given Constant 

Given Constant 

u} 


New file 

New file 



max(A(*e— 1), p(Af)), e > 0 

C h (e - 1) U Cf, e > 0 

u} 


integrity, these histories can never exceed (i.e., be 
greater than) the job security profile. This is specified as, 

A A (oo) -► A; (20) 

C*(oo)->C J (21) 

For 0, we see the properties initialized to their 
simplest form. However, as e gets large, the histories 
accumulate, but never exceed the! upper limit set by the 
job. Ah(e) and C&(e) are important new concepts, 
discussed in further detail laterl We speak of them, 
affectionately, as the security “high-water mark,” with 
analogy to the bath tub ring that marks the highest 
water level attained. 

The Franchise of a new file is always obtained from 
the Franchise of the job given by equation (6). When 
i = /i = 0, the job is controlled by the smgle user u y who 
becomes the owner and creator of the file w^th the sole 
Franchise for the file. 


u € F< (23) 

If equations (22) and (23) hold, then by definition 

u = u* = Uy (24) 

Access is granted to a file if and only if 

Py > P / (25) 

for properties A and C according to equations (8) and 
(9), and 

u, e F , (26) 

If equations (25) and (26) hold, then access is granted 
and A*(e) and C*(e) are calculated by equations (12) 
and (14). 

Model interpretation 


Access control 

Our model is now rich enough to express the equations 
of access control. We wish to control access by a user to 
the system, to a terminal, and to a file. Access is granted 
to the system if and only if 

u e U (22) 

where U is the set of all sanctioned users known to the 
system. 

Access is granted to a terminal if and only if 


Three different dimensions for restricting access to 
sensitive information and information processes are 
possible with the security profile triplet. The generality 
of this technique has considerable application to public 
and military systems. For the system of interest, 
however, the Authority property corresponds to the Top 
Secret, Secret, etc., levels of government and military 
security Category corresponds to the host of special 
control compartments used to restrict access by project 
and area; such as those of the Intelligence and 1 Atomic 
Energv communities; and the Franchise property 
corresponds to access sanctioned on the basis of 
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need-to-know. With this interpretation, the popular 
security terms “classification” and “clearance” can be 
defined by our model ii the same dimensions— as a 
min/max test on the security piofile triplet. Classifica- 
tion is attached to a security object to designate the 
minimum security profile required for access, whereas 
clearance grants to a security object the maximum 
security profile it has permission to exercise. Thus, legal 
access obtains if the clearance is greater than or equal 
to the classification, i.e., if equation (25) holds. 

Another observation on the model is the “job 
umbrella” concept implied by equations (22) through 
(26); i.e., the derived clearance of the job (not the* 
clearance of the user) is used as the security control 
triplet for file access. The job umbrella spreads a 
homogeneous clearance to normalize access to a 
heterogeneous assortment of program and data files. 
This simplifies the problem of control in a multi-level 
security system. Also note how the job umbrella's 
Irgh-water mark (equations (11) through (14)) is used 
to automatically classify new files (equations (17) and 
(18)); this subject is discussed further below. 

A final observation on the model is its application of 
need-to-know to terminal access, equation (23). This 
feature allows terminals to be restricted to special 
people and/or special groups for greater control of 
personnel inteifaces — i.e., systems programmers, com- 
puter operators, etc. 

Security control implementation 

The selection of a set theoretic model of security 
control was not fortuitous, but a deliberate choice biased 
toward computational efficiency and ease of implemen- 
tation. It permits the clean separation and isolation of 
security control code from the security control data, 
which enables ADEPT'S security mechanisms to be 
openly discussed and. still remain safe— a point advo- 
cated by others. 14,10 We achieve this safety by “arming” 
the system with security control data only once at 
start-up time by the SYSLOG procedure discussed later. 
Also, the model improves the credibility of the security 
system, enhancing its understanding and thereby pro- 
moting its certification. 

Security objects: Identity and structure 

Each security object has a unique identification (ID) 
within the system such that it can be managed individu- 
ally. The form of the ID depends upon the security- 
object type; the syntax of each is given below. 


User identification 

For generality of definition, each user is uniquely 
identified by his user:id , which must be less than 13 
characters with no embedded blanks. 

The user:id can be any meaningful encoding for the 
local installation. For example, it can be the individual's 
Social Security number, his military serial number, his 
last name (if unique and less than 13 characters), or 
some local installation man-number convention. The set 
of all user:ids constitutes the universal set, U . 

Terminal identification 

All peripheral devices in ADEPT are identified 
uniquely by their IBM 360 device addresses. Besides 
interactive terminals, this includes disc drives, tape 
drives, line printer, card reader-punch, drums, and 1052 
keyboard. Therefore, terminated must be a two-digit 
hexadecimal number corresponding to the unit address 
of the device. 

Job identification 

ADEPT consists of two parts: the Basic Executive 
(BASEX), which handles the allocation and scheduling 
of hardware resources, and the Extended Executive 
(EXEX), which interfaces user programs with BASEX. 
ADEPT is designed to operate itself and user programs 
as a set of 4096-byte pages. BASEX is identified as 
certain pages that are fixed in main core, whereas EXEX 
and user programs are identified as sets of pages that 
move dynamically between main and swap memory. 
A set of user programs are identified as a job, with page 
sets for each program (the program map) described in 
the job's environment area, i.e., the job's “state tables.” 
Every job in ADEPT has an environment area that 
is swapped with the job. It contains dynamic system 
bookkeeping information pertinent to the job, including 
the contents of the machine registers (saved when the 
job is swapped out), internal file and I/O control tables, 
a map of all the program's pages on drum, user:id , and 
the job security control parameters. The environment 
page(s) are memory-protected against reading and 
writing by user programs, as they are really swappable 
extensions of the monitor's tables. 

The job:id is then a transitory internal parameter 
which changes with each user entrance and exit from the 
system. The job:id is a relative core memory address 
used by the executive as a major index into central 
system tables. It is mapped into an external two-digit 
number that is typed to the user in response to a 
successful LOGIN. 
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File identification 

ADEPTs file system is quite rich in the variety of 
file types, file organization, and equipment permitted. 
There are two file types: temporary and permanent. 

Temporary files are transitory “scratch” disc files, 
which disappear from the system! inventory when their 
parent job exits from the system. They are always 
placed on resident system volumes, and are private to 
the program that created them. 

Permanent files constitute the majority of files 
cataloged by the system. Their permanence derives from 
the fact that they remain inventoried, cataloged, and 
available even after the job that created or last refer- 
enced them is no longer present, and even if they are not 
being used. Permanent files may be placed by the user 
on resident system volumes or on demountable private 
volumes. 

There are six file organizations from which a user may 
select to structure the records of his file: Physical- 
sequential, SI; non-formatted, S2 ; index-sequential, S3; 
partitioned, S4; multiple volume fixed record, S5; and 
single volume fixed record, S9. Regardless of the 
organization of the records, ADEPT manages them as a 
collection, called a file. Thus, security control is at the 
file level only, unlike more definitive schemes of 
sub-element control. 8 ,10 “ 12 

All the control information of a file that describes 
type, organization, physical storage' location, date of 
creation, and security is distinct from the data records 
of the file, and is the catalog of the file. 

All cataloged ADEPT files are uniquely identified by 
a four-part name; each part has various options and 
defaults (system assumptions). This name, the file:id, 
has the following form: 

file:id : : = name, form, user :id, volume:id 

Name is a user-generated character string of up to 
eight characters with no embedded blanks. It must be 
unique on a private volume as well as for Public files 
(described below) . 

Form is a descriptor of the internal coding of a file. 
Up to 256 encodings are possible, although only these 
seven are currently applicable : 

1 = binary data 

2 = relocatable program 

3 = non-relocatable program 

4 = card images 

5 = catalog 

6 = DLO (DeZayed Output) 

7 = line images 


Useriid corresponds to th'e owner of the file, i.e., the 
creator of the file. 

Volume:id is the unique file storage device (tape, disc, 
disc pack, etc.) on which the file resides. For various 
reasons, including reliability, ADEPT file inventories 
are distributed across the available storage media, 
rather than centralized on one particular volume. Thus, 
all files on a given disc volume are inventoried on 
that volume. 

Security properties: Encoding and structure 

Implementation of the security properties in ADEPT 
is not uniform across the security objects as suggested 
by our model, particularly the Franchise property. Lack 
of uniformity, brought about by real-world considera- 
tions, is not a liability of the system but a reflection of 
the simplicity of the model. Extensions to the model are 
developed here in accordance with that actually 
implemented in ADEPT. 

Authority 

Authority is fixed at four levels (to = 3 for equation 
(1)) in ADEPT, specifically, UNCLASSIFIED, CON- 
FIDENTIAL, SECRET, and TOP SECRET in 
accordance with Department of Defense security 
regulations. The Authority set is encoded as a logical 
4-bit item, where positional order is important. Magni- 
tude tests are used extensively, such that the high-order 
bits imply high Authority in the sense of equation (8). 

Category 

Category is limited to a maximum of 16 compart- 
ments (^ < 15 for equation (2)), encoded as a logical 
16-bit item. Boolean tests are used exclusively on this 
datum. The definition of (and bit position correspond- 
ence to) specific compartments is an installation option 
at ADEPT start-up time (see SYSLOG). Typical 
examples of compartments are EYES ONLY, 
CRYPTO, RESTRICTED, SENSITIVE, etc. 

Franchise 

Property Franchise corresponds to the military 
concept of need-to-know. Essentially, this corresponds 
to a set of user:ids ; however, the ADEPT implementa- 
tion of Franchise is different for each security object; 

1. User: All users wishing ADEPT service must be 
known to the system. This knowledge is imparted 
by SYSLOG at start-up time and limited to 
approximately 500 user:ids (max(C/) < 500). 
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2. Terminal: Equation, (5) specifies the Franchise 
of a given terminal, Ft, as a set of user:ids. In 
ADEPT, F t does not exist. One may define all 
the users for a given terminal, i.e., F t ; or alterna- 
tively, all the terminals for a given user. Because 
SYSLOG orders its tables by user:id , the latter 
definition was found more convenient to 
implement. 

3. Job: The Franchise of a job is the useriid of the 
creator of the job at the time of LOGIN to the 
system. Currently, only one user has access to 
(and control of) a job (m = 0 for equation (6)). 

4. File: Implementation of Franchise for a file (F /), 
is more extensive than equation (7). In ADEPT, 
we wish to control not only who accesses a file, 
but also the quality of access granted. We have 
defined a set of four exclusive qualities of access, 
such that a given quality, q, is defined if 

q e {READ, WRITE, READ-AND- 
WRITE, READ - AN D - WRI TE - 
WITH-LOCKOUT-OVERRIDE} (27) 

ADEPT permits simultaneous access to a file by 
many jobs if the quality of access is for READ 
only. However, only one job may access a file 
with WRITE, or READ-AND-WRITE quality. 
ADEPT automatically locks out access to a file 
being written to avoid simultaneous reading and 
writing conflicts. A special access quality, how- 
ever, does permit lockout override. Equation (7) 
can now be extended as a set of pairs, 

*7 = { « q°), « q 1 ), •••, (uj, q 7 ) } ; (28) 

where q 1 are not necessarily distinct and are given 
by equation (27). 

The implementation of equation (28) is depend- 
ent upon 7, the number of franchised users. 
When 7 = 0, we have the ADEPT Private file, 
exclusive to the owner, u°; for 7 = ma x(U), we 
have the Public file; values of 7 between these 
extremes yield the Semi-Private file. 7 is 
implicitly encoded as the ADEPT “privacy” 
item in the file's catalog control data, and takes 
the place of F/ for all cases except a Semi-Private 
file. For that case exclusively, equation (28) holds 
and an actual F / list of user:id , quality pairs 
exists as a need-to-know list. The owner of a file 
specifies and controls the file's privacy, including 
the composition of the need-to-know list. 


Security control initialization: SYSLOG 

SYSLOG is a component of the ADEPT initialization 
package responsible for arming the security controls. It 
operates as one of a number of system start-up options 
prior to the time when terminals are enabled. SYSLOG 
sets up the security profile data for user:id and 
terminated , i.e., the “given constants” of Table I. 

SYSLOG creates or updates a highly sensitive 
system disc file, where each record corresponds to an 
authorized user. These records are constructed from a 
deck of cards consisting of separate data sets for 
compartment definitions, terminal :id classification, and 
user:id clearance. The dictionary of compartment defini- 
tions contains the less-than-9-eharacter mnemonic for 
each member of the Category set. Data sets are formed 
from the card types shown in Table II. Use of passwords 
is described later in the LOGIN procedure. 

An IDT card must exist for each authorized user; the 
PWD, DEV, SEC, and CAT card types are optional. 
Other card types are possible, but not germane to 
security control, e.g., ACT for accounting purposes. 
More than one PWD, DEV, and CAT card is acceptable 
up to the current maximum data limits (i.e., 64 pass- 
words, 48 terminahids , and 16 compartments). 

A variety of legality checks for proper data syntax, 
quantity, and order are provided. SYSLOG assumes the 
following default conditions when the corresponding 
card type is omitted from each data set: 


PWD 

No password required 

DEV 

All terminahids authorized 

SEC 

A - UNCLASSIFIED 

CAT 

C = null (all zero mask) 


This gives the lowest user clearance as the default, 
while permitting convenient user access. Various options 
exist in SYSLOG to permit maintenance of the internal 
SYSLOG tables, including the replacement or deletion 
of existing data sets in total or in part. 

The sensitivity of the information in the security 
control deck is obvious. Procedures have been developed 
at each installation that give the function of deck 
creation, control, and loading to specially cleared 
security personnel. The internal SYSLOG file itself is 
protected in a special manner described later. 

Access control 

A fundamental security concern in multi-access sys- 
is that many users with different clearances will be 
simultaneously using the system, thereby raising the 
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TABLE II— SYSLOG control cards 


Card Type 
DICT 

compartmenti • • • compartment i Q 


Purpose 

Identifies start of data set of compartment definitions. 
Defines up to 16 compartments. 


TERMINAL 
UNIT terminated 
IDT usered 

PWD password • • • password 
DEV terminahidi * • • terminated ^ 


Identifies start of data sets of terminal definitions. 
Identifies start of a terminal data set. 

Identifies start of a user data set. 

Defines legal passwords for user .’id up to 64. 
Defines legal terminals for usered up to 48. 


SEC Authority Defines usered Authority. 

CAT compartmenti • • • compartment i 6 Defines usered Category set. 


possibility of security compromise. Since programs are 
the “active agents” of the user, the system must 
maintain the integrity of each and of itself from 
accidental and/or deliberate intrusion. A multifile 
system must permit concurrent access by one or more 
jobs to one or more on-line, independently classified files. 

ADEPT is all these things — multiuser, multiprogram, 
and multifile system. Thus, this section deals with access 
control over users, programs, and files. 

User access control: LOGIN 

To gain admittance to the system, a user must first 
satisfy the ADEPT LOGIN decision procedure. This 
procedure attempts to authenticate the user in a fashion 
analogous to challenge-response practices. 

The syntax of the ADEPT LOGIN command, typed 
by a user on his terminal, is as follows: 

/LOGIN usered password accounting 

Figure 1 pictorially displays the LOGIN decision 
procedure based upon the user-specified input param- 
eters. Usernd is the index into the SYSLOG file used to 
retrieve the user security profile. If no such record exists 
(i.e., equation (22) fails), the LOGIN is unsuccessful and 
system access is denied. If the security profile is found, 
LOGIN next retrieves the terminated for the keyboard 
in use from internal system tables, and searches for a 
match in the terminated list for which the usered was 
franchised by SYSLOG. An unsuccessful search is an 
unsuccessful LOGIN. 

If the terminal is franchised, then the current pass- 
word is retrieved from the SYSLOG file for this usered 
and matched against the password entered as a keyboard 
parameter to LOGIN. An unsuccessful match is again 


an unsuccessful LOGIN. Furthermore, the terminal is 
ignored (will not honor input) for approximately 30 
seconds to frustrate high-speed, computer- assisted, 
penetration attempts. If, however, the match is 
successful (equation (22) holds), the current password in 
the SYSLOG file for this usered is discarded and 
LOGIN proceeds to create the job clearance. 



Figure 1 — LOGIN decision procedure 
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Passwords in ADEPT obey the same syntax conven- 
tions as user:id. (See the earlier description of User 
Identification.) Although easily increased, currently 
SYSLOG permits up to 64 passwords. Each successful 
LOGIN throws away the user password ; 64 successful 
LOGINs are possible before a new set of passwords 
need be established. If other than random, once-only 
passwords are desired, the 64 passwords may be encoded 
in some algorithmic manner, or replicated some number 
of times. Once-only passwords is an easily implemented 
technique for user authentication, which has been 
advocated by others. 2 * 7 It is a highly effective and 
secure technique because of the high permutability of 
12-character -passwords and their time and order 
interdependence, known only to the user. 

Once the authentication process is completely satis- 
fied, LOGIN creates the job security profile according to 
equations (15) and (16) of our model. That is, the lower 
Authority of the user and the terminal becomes A y, and 
the intersection (logical AND) of the user and terminal 
Category sets becomes the Category of the job, Cy. For 
example, a user with TOP SECRET Authority and a 
Category set (1001 1001 0000 1101) operating from a 
SECRET level terminal with a Category set (0000 0000 
0000 0010) controls a job cleared to SECRET with an 
empty Category set. 

Program access control: LOAD 

As noted earlier, the ADEPT Executive consists of 
two parts: BASEX, the resident part, and EXEX, the 
swapped part. EXEX is a body of reentrant code 
shared by all users; however, it is treated as a distinct 
program in each user’s job. Up to four programs can 
exist concurrently in the job. Each operates with the job 
clearance — the job clearance umbrella. 

LOAD is the ADEPT component used to load the 
programs chosen by the user; it is part of EXEX and 
hence operates as part of the user’s job with the job’s 
clearance. Programs are cataloged files and as such may 
be classified with a given security profile. As is described 
in “File Access Control” below, LOAD can only load 
those programs for which the job clearance is sufficient. 
Once loaded, however, the new program operates with 
the job clearance. 

In this manner, we see the power of the job umbrella 
in providing smooth, flexible user operation concurrent 
with necessary security control. Program files may be 
classified with a variety of security profiles and then 
operate with yet another, i.e., the job clearance. By this 
technique security is assured and programs of different 
classifications may be operated by a user as one job. It 


permits, for example, an unclassified program file (e.g., 
a file editor) to be loaded into a highly classified job to 
process sensitive classified data files. 

File access control: OPEN 

Before input/output can be performed on a file, 
a program must first acquire the file by an OPEN call 
to the Cataloger. Each program must OPEN a file for 
itself before it can manipulate the file, even if the file is 
already OPENed for another program. A successful 
OPEN requires proper specification of the file’s descrip- 
tors— some of which are in the OPEN call, others of 
which are picked up directly by the Cataloger from the 
job environment area (e.g., job clearance, user:id) — and 
Satisfactory job clearance and useriid need-to-know 
qualifications according to equations (25) and (26) of 
our model. Equation (25) is implemented as (8) as a 
straightforward magnitude comparison between Ay and 
A/. Equation (25) is implemented as (9) as an equality 
test between C/ and (Cy A C/). We use (Cy A C/) to 
ensure that C/ is a subset of the job categories; i.e., the 
job umbrella. Lastly, equation (26) is a NOP if the file 
is Public; a simple equality test between uy and u/ if the 
File is Private; and a table search of F/ for uy if the file 
is Semi-Private. These tests do increase processing time 
for file access; however, the tests are performed only 
once at OPEN time, where the cost is insignificant 
relative to the I/O processing subsequently performed 
on the file. 

The quality of access granted by a successful OPEN, 
and subsequently enforced for all I/O transfers, is that 
requested, even if the user has a greater Franchise. For 
example, during program debugging, the owner of a file 
may OPEN it for READ access only, even though 
READ-AND-WRITE access quality is permitted. He 
thereby protects his file from possible uncontrolled 
modification by an erroneous WRITE call. 

Considerable controversy surrounds the issue of 
automatic classification of new files formed by subset or 
merger of existing files. The heart of the issue is the poor 
accuracy of many such classification techniques 17 and 
the fear of too many over-classified files (a fear of 
operations personnel) or of too many under-classified 
files (a fear of the security control officers). ADEPT 
finesses the problem with a clever heuristic— most new 
files are created from, existing files, hence classify the new 
file as a private file with the composite Authority and 
Category of all files referenced. This is achieved in 
ADEPT by use of the “high-water mark.” 

Starting with the boundary conditions of equations 
(11) and (13), the Cataloger applies equations (12) and 
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(14) for each successful file OPEN, and hence maintains 
the composite classification history of all files referenced 
by the job. For each new and temporary file OPEN, the 
Cataloger applies equations (17), (18), and (19); they 
are reapplied for each CLOSE of a ‘new file, to update 
the classification (due to changes in the high-water mark 
since the OPEN) when the file becomes an existing 
cataloged file in the inventory. The scheme rarely 
underclassifies, and tends to overclassify when the new 
file is created late in the job cycle, as shown by boundary 
equations (20) and (21). 

Trans-formal security features 

ADEPT contains a host of features that transcend 
the formalism presented earlier. They are described here 
because they are integral to the total security control 
system and form a body of experience from which new 
formalisms can draw. 

Computer hardware 

ADEPT operates on an IBM System 360/50 and is, 
therefore, limited to the hardware available. Studies by 
Bingham 9 suggest a variety of hardware features for 
security control, many of which are possessed by 
System 360. 

IBM System 360 can operate in one of two states: the 
Supervisor state, or the Problem state. ADEPT execu- 
tive programs operate in the Supervisor state; user 
programs operate in the Problem state. 

A number of machine instructions are “privileged” to 
tfie Supervisor state only. An attempt to execute them 
in the Problem state is trapped by the hardware and 
control is returned to the executive program for 
remedial action. ADEPT disposes of these alarms by 
suspending the guilty job. (A suspended job may be 
resumed by the user.) Clearly, instructions that change 
the machine state are privileged to the executive only. 

Another class of privileged instructions consists of 
those dealing with input/output. Problem state pro- 
grams cannot directly access information files on 
secondary memory storage devices such as disc, tape, or 
drum. They must access these files indirectly by 
requests to the executive system. The requests are 
subjected to interpretive screening by the executive 
software. 

Main memory is selectively protected against un- 
authorized change (write protected). We have also had 
the 360/50 modified to include fetch protection, which 
guards against unauthorized reading of — or executing 
from — protected memory. The memory protect instruc- 


tions are also privileged only in the Supervisor state. 

ADEPT software protects memory on a 4096-byte 
“page” basis (the hardware permits 2048-byte pages), 
allowing a non-contiguous mosaic of protected pages in 
memory for a given program. To satisfy multiprogram- 
ming, many different protection groups are needed. 
Through the use of programmable 4-bit hardware masks, 
up to 15 different protection groups can be accom- 
modated in core concurrently. ADEPT executive 
programs operate with the all- zero “master key” mask, 
permitting universal access by all Basic and Extended 
Executive components. 

There are five classes of interrupts processed by 
System/360 hardware: input/output, program, super- 
visor call, external, and machine check. Any interrupts 
that occur in the Problem state cause an automatic 
hardware switch to the Supervisor state, with CPU 
control flowing to the appropriate ADEPT executive 
interrupt controller. All security-vulnerable functions 
including hardware errors, external timer and keyboard 
actions, user program service requests, illegal instruc- 
tions, memory protect violations, and input/output, are 
called to the attention of ADEPT by the System/360 
interrupt system. The burden for security integrity is 
then one for ADEPT software. 

Monitor software 

Inducing the system to violate its own protection 
mechanisms is one of the most likely ways of breaking a 
multi-access system. Those system components that 
perform tasks in response to user or program requests 
are most susceptible to such seduction. 

On-Line debugging 

The debugging program provides an on-line capability 
for the professional programmer to dynamically look at 
and change selected portions of his program’s memory. 
DEBUG can be directed to access sensitive core 
memory that would not be trapped by memory protec- 
tion, since, as an EXEX component operating in the 
Supervisor state, DEBLTG operates with the memory 
protection master key. To close this “trap door,’ 
DEBUG always performs interpretive checks on the 
legality of the debugging request. These checks are 
based upon address-out-of-bounds criteria, i.o., the 
requested debugging address must lie within the user s 
program, area. If not, the request will be denied and the 
user warned, but he will not be terminated as has been 
suggested. 7 
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Input/output 

Input/output in System/360 is handled by a number 
of special-purpose processors, called Selector Channels. 
To initiate any I/O, it is necessary for a channel 
program to be executed by the Selector Channel. 

SPAM, the BASEX component that permits symbolic 
input/output calls from user programs, is really a 
special-purpose compiler that produces I/O channel 
programs from the SPAM calls. These channel progams 
are subsequently delivered and. executed by the ADEPT 
Input/Output Supervisor, IOS. 

SPAM permits a variety of calls to read, write, alter, 
search for, and position to records within cataloged files. 
To achieve these ends, SPAM depends upon a variety 
of control tables dynamically created by the Cataloger 
in the job environment. 

The initiating and subsequent monitoring of channel 
program execution is the responsibility of the BASEX 
Input/Output Supervisor, IOS. IOS is called to execute 
a channel program (EXCP). System components, such 
as SPAM, branch to IOS at a known entry point that is 
fetch-protected, against entry in the Problem state. IOS 
is off-limits to user programs attempting to access 
cataloged storage. For protection against unauthorized 
EXCP requests, IOS always performs legality checks 
before executing a channel program. These checks begin 
by examination of the device addressed by the channel 
program. If it is the device address for cataloged 
storage, further checks are made to determine the 
machine state of the calling program. That state must be 
Supervisor state for the call to be honored. A call in the 
Problem state would indicate an illegal EXCP call from 
a user program. 

IOS m. k Iher checks to guarantee the validity of 
a n 1/ C request i checks to see that the specified buffer 
areas for the transfer do not overlay the channel 
program itself, an- lie within the user’s program 
memory area, i.e., do not modify or access system or 
protected memory. 

Covert I/O violations are also forestalled since I/O 
components take direction from information stored in 
the job environment — an area read- and write-protected 
from Problem state programs. 

Classified residue 

Classified residue is classified information (either code 
or data) left behind in memory (i.e., core, drum, or 
disc) after the program that referenced it has been 
dismissed, swapped out, or quit from the system. The 
standard solution to the problem is to dynamically 
purge the contaminated memory (e.g., overwrite with 


random numbers, or zeros). In a system supporting over 
billion bytes of memory, that solution is unreasonable 
and in conflict with high performance goals. ADEPT’S 
solution to the dilemma of denying access to classified 
residue while maintaining high performance depends 
upon techniques of controlled memory allocation. 

1. Core Residue 

As noted earlier, all core storage is allocated as 
4096-byte pages. These pages are always cleared 
to zero when allocated, thereby overwriting any 
potential residue. 

Via the program’s page map, the ADEPT 
executive system labels all code and data pages 
(they need not be contiguous) belonging to a 
given program with a single hardware memory 
protection key, thereby prohibiting unauthorized 
reading or writing by other, potentially co- 
resident user programs that may be in execution. 
Furthermore, BASEX keeps a running account 
of the status and disposition of all pages of core. 

The Loader and Swapper components of 
ADEPT always work with full 4096-byte pages. 
Unfilled portions of pages at load time are kept 
cleared to zero as when they were allocated, and 
the full 4096 bytes are swapped into core, if not 
already resident, each scheduled time slice. 
Further, newly allocated pages are marked as 
“changed” pages, thus guaranteeing subsequent 
swap out to drum. 

With these procedures, ADEPT denies access 
by a user or program to those pages of core not 
identified as part of his program, and clears core 
residue by over-writing accessible core at load 
and swap times. 

2. Drum Residue 

ADEPT always clears a drum page to zero 
before it is allocated. The page may subsequently 
be cleared again to user-specified data. ADEPT 
also maintains a drum map that notes the 
disposition of all drum pages (800 pages for the 
IBM 2303 drum). Drum input/output, like all 
ADEPT I/O, is controlled by executive privileged 
instructions. 

3. Disc Residue 

Disc files in ADEPT are maintained as 
“dirty” memory. That is, the large capacity of 
the file system makes it infeasible to consider 
automatic over-writing techniques for residue 
control; therefore, deleted disc tracks are re- 
turned to the available storage pool contaminated 
and unclean. It then becomes the burden of the 
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ADEPT file system to control any unauthorized 
file access, whether to cataloged files or un- 
cataloged disc memory. 

Team work between the Cataloger, SPA M and 
IOS components of ADEPT achieves this control 
via legality checking of all OPEN and I/O 
requests. 

For example, all disc packs are labeled 
internally and externally with their volume:id, 
and this label is checked, at the time of mounting 
by the Cataloger OPEN procedure to assure 
proper volume mounting. Tapes may also be 
labeled and checked as a user option. 

Of particular note, SPA.M always assumes that 
an end-of-file (EOF) immediately follows the 
last record written in a new file, and it prohibits 
reading beyond that EOF. Contaminated tracks 
allocated to new files cannot be read until they 
are first written. The act of writing advances the 
EOF and the user simultaneously over-writes the 
classified residue with his own data. The user 
cannot skip over the EOF, and the EOF location 
is itself protected in the job environment area. 

4. Tape Residue 

No special features for tape residue control are 
implemented in A DEPT. Tape residue control is 
easily satisfied by manual, off-line tape de- 
gaussing prior to ADEPT use. 


System files 

Equation (28) led us to examine Private, Semi- 
Private, and Public files. ADEPT possesses two 
additional file privacies that transcend our model; both 
are system files. Privacy-4 system files are the need -to - 
know lists created by the Cataloger itself for Serni- 
Private files. Privacy-5 system files are private system 
memory for the SYSLOG files and the catalogs 
themselves. 

A.ccess to these files is restricted, to the system only. 
Special access checks are made that differ from those of 
equations (25) and (26). First, a special user:id is 
required that is not a member of U (i.e., not in the 
SYSLOG file). Second, the program making the OPEN 
call must be in Supervisor state. Third, the program 
making the OPE A call must be a member of a short list 
of EXEX programs. The list is built into the Cataloger 
at the time of compilation. In this manner, access to 
system files is severely restricted, even to system 
programs. 

Security service commands 

ADEPT provides a variety of service commands that 
involve security control. The commands are listed in 
Table III. Note that commands VARYON, YARYOFF, 
REPLACE, LISTU, AUDIT, AUDOFF, and WRAP- 
UP are restricted to a particular terminal — the Security 
Officer’s Station. 


Command 

AUDIT* 

AUDOFF* 

CHANGE 

CREATE 

LISTU* 

RECLASS 

RELOG 

REPLACE* 

SECURITY 


VARYON/VARYOFF* 

WRAPUP* 


TABLE III — Security service commands 

Purpose 

Turns on security audit recording. 

Turns off security audit recording. 

Enables the owner of a file to change any of the access control information of 
the file. 

Enables a user to create a Semi-Private file and its need-to-know list. 

Lists by terminahid all the current logged in user:ids. 

Enables a user to raise or lower his job clearance between the bounds of the 
original LOGIN and current high-water mark clearance. 

Like LOGIN, but reconnects a user to an already existing job, as when a remote 
terminal drops off the communications line. 

Enables a user to move his job to another terminal or to reclassify a given device. 
Print on the user’s terminal approximately every 100 lines (or only by requestd 
the job high-water mark (or clearance by request) as a reminder to the user an) 
as a classification stamp of the level of current security activity. 

Permits terminals to be varied on- and off-line for flexibility in system 
maintenance and configuration control. 

Shuts down system after a specified elapsed time. 


* Restricted to Security Officer’s Station only. 
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Audit 

The AUDIT function records certain transactions 
relating to files, terminals, and users, and is the elec- 
tronic equivalent of manual security accountability logs. 
Its purpose is to provide a record of user access in order 
to determine whether security violations have occurred 
and the extent to which secure data has been com- 
promised. The AUDIT function may be initiated only 
at start-up time, but may be terminated at any time. 
All data re recorded on disc or tape in real time so the 
data is safe if the system malfunctions. An auxiliary 
utility program, AUDLIST, may be used to list the 
AUDIT file. The information recorded is shown in 
Table IV. 


Implementation of AUDIT is quite straightforward, 
a product of general ADEPT recording and instrumen- 
tation. 18,19 AUDIT is an EXEX component that is 
called by, and at the completion of, each function o be 
recorded. The information to be recorded is pass d to 
AUDIT in the general registers. Additional I/O 
overhead is the primary cost incurred in the operation 
of AUDIT, for swapping and file maintenance. This 
cost is nominal, however, amounting to less than one 
percent of the CPU time. 

SUMMARY 

In summary we may ask: How well have we met our 
goals? First, we believe we have developed and success- 


TABLE IV — Security events and information audited by ADEPT-50 



This is the "OPEN existing file" command. 

A list of all the terminal devices and their assigned security and categories is recorded at each system load. 
A list of the prose category names is recorded at each system load. 

Whenever the system is restarted on the same day (and AUDIT had been turned on earlier that day) the time of 
the restart is recorded. 

The time that the AUDOFF action was taken, or the time that the WRAPUP function called AUDIT, to terminate the 
AUDIT function. 
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fully demonstrated a security control mechanism that 
more than adequately supports heterogeneous levels and 
types of classification. Of note in this regard is the 
LOGIN decision procedure, access control tests, job 
umbrella, high-water mark, and audit trails recording. 
The approach can be improved in the direction of more 
compartments (on the order of 1000 or more), extension 
of the model to include system files, and the imple- 
mentation of a single Franchise test for all security 
objects. The implementation needs redundant encoding 
and error detection of security profile data to increase 
confidence in the system — though we have not ourselves 
experienced difficulty here. The increase in memory 
requirements to achieve these improvements may force 
numerical encoding of security data, particularly 
Category, as suggested by Peters. 7 

Second, SYSLOG has been highly successful in 
demonstrating the concept of “security arming' ' of the 
system at start-up time. Our greatest difficulty in this 
area has been with the human element — the computer 
operators — in preparing and handling the control deck. 
In opposition to Peters, 7 we believe the operator should 
not be “designed out of the operation as much as 
possible," but rather his capabilities should be upgraded 
to meet the greater levels of sophistication and responsi- 
bility required to operate a time-sharing system. 20 He 
should be considered part of line management. ADEPT 
is oriented in this direction and work now in progress is 
aimed at building a real-time security surveillance and 
operations station (SOS). 

Third, we missed the target in our attempt to isolate 
and limit the amount of critical coding. Though much 
of the control mechanism is restricted to a few com- 
ponents— LOGIN, SYSLOG, CATALOGER, AUDIT 
—enough is sprinkled around in other areas to make it 
impossible to restrict the omnipotent capabilities of the 
monitor, e.g., to run EXEX in Problem state. Some 
additional design forethought could have avoided some 
of this dispersal, particularly the wide distribution in 
memory of system data and programs that set and use 
these data. The effect of this shortcoming is the need for 
considerably greater checkout time, and the lowered 
confidence in the system's integrity. 

Lastly, on the brighter side, we were surprisingly 
frugal in the cost of implementing this security control 
mechanism. It took approximately five percent of our 
effort to design, code, and checkout the ADEPT 
security control features. The code represents about ten 
percent of the 50,000 instructions in the system. Though 
the code is widely distributed, SYSLOG, security 
commands, LOGIN, AUDIT, and the CATALOGER 
account for about 80 percent of it. The overhead cost of 


operating these controls is difficult to measure, but it is 
quite low, in the order of one or two percent of total 
CPU time for normal operation, excluding SYSLOG. 
(SYSLOG, of course, runs at card reader speed.) The 
most significant area of overhead is in the checking of 
I/O channel programs, where some 5 to 10 msec are 
expended per call (on the average). Since this time is 
overlapped with other I/O, only CPU bound programs 
suffer degredation. AUDIT recording also contributes 
to service call overhead. In actuality, the net operating 
cost of our security controls may be zero or possibly 
negative, since AUDIT recordings showed us numerous 
trivial ways to measurably lower system overhead. 
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INTRODUCTION 

For many years, informed persons have expended 
considerable time and energy attempting to evolve 
an acceptable philosophic assessment of the concept 
of “privacy.” Studies made in the fields of anthropology, 
phychology, and sociology are in general agreement 
that both the mental and physical well-being of an 
individual requires freedom to experience some degree 
of personal anonymity within the environment. 
While the significance of “privacy” has been recognized, 
it has eluded the constraint of an acceptable defini- 
tion. The search for a workable definition continues 
as man seeks a means for establishing practical bounds 
for inter-personal relations. 

Recently, the concern for “privacy” has become a 
rallying point for those who see the present growth 
and applications of data automation as a threat to 
the “rights of privacy” of the individual. These advo- 
cates lament that the individual is unaware of the 
threat to his “loss of privacy” as his attention is 
diverted by the glowing promises of anticipated 
benefits that may become available through data 
automation. 

It is the writer’s belief that through the proper and 
reasonable utilization of the tools of modern data tech- 
nology man will have within his power a mechanism 
that has the potential of becoming his strongest ally 
in his search for means to preserve the values of “pri- 
vacy.” In reality, the critical element in this question 
of “privacy” should not address itself to the electro- 
mechanical capability of the computer or system tele- 
communications functions. The true focal point is the 
direct challenge to the discipline and conduct of man 
who is the designer and user of the data system . 6 Man 


must be willing to abide by the standards he derives 
from his own “privacy” criteria. He must staunchly 
forego any temptation to engage in system shortcuts, 
and he must hold to the position that he will not accept 
lightly any violations of the “confidentiality controls” 
established for system operation. Any breach in the 
integrity of the system must be viewed as a direct 
personal challenge to the integrity of each person 
associated with the undertaking. 

SUMMARY 

The following is a brief resume of significant elements 
that have been identified with the question of “pri- 
vacy.” These comments are not offered as final nor 
are they to be considered as embracing the entire 
area of concern. The summary is presented simply as 
a means of bringing together some key factors that 
could serve as a foundation for a basic “privacy” con- 
trol system. The working standards will evolve as 
man gains more experience with this powerful ally 
and is able to resolve philosophical and ethical ques- 
tions that are inherent in the concept of “privacy”. 
As the environment and pace of modern life adjust 
to current needs, the nature of “privacy” will probably 
also reflect changes in priorities and the character of 
the social stresses. 

Elements in the invasion of privacy 

No definitive statement exists which provides a 
clear and acceptable statement of what is “private 
information,” or what constitutes an “unwarranted 
invasion of privacy.” Any criteria proposed to date 
to identify “private information,” or describe an act 
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that would constitute “unwarr anted invasion of pri- 
vacy,” must take into account whether or not such 
disclosure of the specific data: 

A. Would relate to an individual , a family or other 
small group in such manner as to facilitate the 
likelihood of the unwarranted identification of 
the individuals, or 

B. The data is not considered public information by 
provision of legal statute, or 

C. Would cause or be the basis for unjust economic 
loss or social stigma or harassment to the 
individual, or 

D. Result in the unnecessary loss of a property 
right. 

What is private vs. what is confidential ? 

When attempting to discuss “privacy,” the term 
“confidentiality” inevitably will join the debate, but 
does not promote clarification. What sort of personal 
information do reasonable men interpret as “private?” 
The answer to this question depends upon many 
things; for example, any one or more of the following 
factors may apply: 

A. The context within which the specific information 
is embedded, 

B. The amount of information assembled and ac- 
cessible, 

C. The intrinsic nature of the information. 

D. The sophistication of the social values held by 
the individuals concerned, 

E. The character and scope of the sub-culture , 

F. Significance of personal attributes such as: age, 
ancestry, social status, race, etc. 

Recently, the California Intergovernmental Board 
on EDP was established by statute . 1 It is charged 
with responsibility to provide for intergovernmental 
representation in the coordination of the many govern- 
ment sponsored EDP programs and to take leadership 
in the establishment of intersystem standards. The 
Intergovernmental Board appointed a select Tech- 
nical Advisory Committee to assist in the preparation 
of a Manual to serve as a guideline for all agencies 
in the development of local systems and facilitate 
adequate interface capability as required. The manual 
was completed and is under review by the Inter- 
governmental Board prior to general release to official 
agencies throughout the State of California. 

A sub-committee of the Technical Advisory Com- 
mittee was specifically assigned to address the question 
of “privacy”. The members of the Privacy Sub-com- 


mittee concluded, after some study, that there are a 
number of personal information items that could be 
made accessible to an integrated data system without 
any threat to the individual “privacy”. It was also 
recognized that there are many other data items that 
for one reason or another should be restricted from 
wide access in the absence of an established right to 
know. Some examples of these data items are as shown 
below: 

A. Information that may not be relevant to personal 
privacy : 

Name 

Maiden Name 
Address 
Age or DOB 
Race 

B. Information that would probably be relevant to 
personal privacy : 

Occupation 

Education 

Income 

Religious Preference 
Political Preference 
Family Size 
Number of Children 
Ages of Children 
Taxes Paid 
History of Residence 
Attitudes Toward Social Issues 
Property Ownership 
Value of Real Property 
Marital History 
Drinking Practices 
Hospitalization Record 
Medical Record 
Symptoms of Illness 
Record of Arrest 
Ancestry 
Nationality 
Name of Relatives 
Response to Psychological 
or Medical Questions 

Proliferation of data items throughout culture 

While some of the information items mentioned 
above may be found on records that are classified as 
confidential, many of the information items may also 
be found on records that are not subject to restriction 


Sex 

Marital Status 
Name of Spouse 
Next of Kin 
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by law or policy. The current trend in social inter- 
course and information exchange reflects an ever- 
broadening depth of self-revealment by individuals. 
Private and governmental services are being extended 
into newer areas and thereby attracting the partici- 
pation of an ever-growing segment of the citizenry. 

The integration of interagency information systems 
with data exchange introduces a new dimension as- 
sociated with the creation of composite record images 
of persons known to the total system. These images are 
the product of independent and frequently unrelated 
inputs of data to serve other specific needs. Any 
integrated interagency information system with this 
potential capability must be administered by profes- 
sionally qualified persons who remain sensitive of the 
need to verify both the identification of the subject 
of inquiry and the inquirer’s “right to know”. As more 
data systems are activated and interfaces are estab- 
lished, the individual who is the initial source of the 
data becomes more remote and isolated from the 
operational inquiry that relates to his record. It should 
be the constant aim of the system design, operational 
programming, and user discipline to assure that system 
integrity is not subverted. 

Significance of developing standards for data verification 

Attention should not be directed solely to provide 
for the identification and classification of personal 
data items. What is equally important, standards 
must be developed and adopted to guide data ac- 
ceptance and utilization with respect to the ability 
to verify the information. For example, the confidence 
in the operating system will be increased and utiliza- 
tion encouraged if the user is assured that data items 
are subject to verification as to : 

A. Accuracy 

B. Bias 

C. Completeness 

D. Currency 

E. Documentation 

F. Satisfaction of Legal Requirements 

A safety value that will support a sound verifica- 
tion program is to initiate a practical data purge 
system. The best data system in terms of cost/benefit 
analysis is one that has a high content of active data 
and one that is adequately updated. The effect of estab- 
lishing a continuous and critical purge system is to 
provide an orderly review of file content, to remove 
inactive or low value data. 


One approach to a data classification plan 

A number of studies have been undertaken in an 
attempt to identify and define data items that should 
be processed as classified or confidential. There have 
been perhaps as many solutions offered as there have 
been studies proposed. The Privacy Sub-committee 
mentioned above proposed a simple three category 
data plan for consideration and approval of the Cali- 
fornia Intergovernmental Board on EDP. 2 The con- 
cept is summarized below: 

A. Confidential : 

This classification has the highest level of 
restriction, and should be limited to data which 
is prohibited from free and full disclosure by 
statutory regulation (law). 

B. Restricted : 

This is data which: 

1. Is not prohibited from full and free dis- 
closure by statute (coufidential), and 

2. An unauthorized intrusion could con- 
stitute an unwarranted invasion of per- 
sonal privacy, and 

3. Has been administratively assigned a 
security classification — restricted. 

C. Unclassified : 

All data maintained by a public agency not 
otherwise classified as confidential or restricted 
as defined below. 

Sources of classification criteria 

The criteria for the establishment of classification 
of data arise from a variety of sources. In many in- 
stances, the criteria is a result of the interaction of 
one or more of the following: 

A. Public Policy: 

The living residue of tradition and social ac- 
ceptance. 

B. Statutory Law: 

The formalized and legal codification of social 
needs and standards of conduct. 

C . Legal I nterpretation : 

The implementation of judicial and admin- 
istrative decisions that have been sanctioned 
through public acceptance. 

D. User Agency Specifications : 

Operational decisions that have been adopted 
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and ennunciated to promote agency goals in 
an atmosphere of public support. 

E. Personal Needs of The Individual : 

Acceptance of the system integrity by the pub- 
lic who participate and furnish personal infor- 
mation to assist an agency function with respect 
to the needs of the individual (Federal Census, 
Social Security, etc.). 

Each of the sources of criteria utilized is subject 
to its own characteristic variations, and will require 
continuous reevaluation. The scope of data items 
subject to the confidential classification are under 
constant adjustment and reassessment due to the 
dynamic character of the social conditions which give 
rise to the data. 

Identification of areas sensitive to intrusion 3 

One of the main deterrents to the development of 
new ideas about privacy has been the lack of specificity 
as to where the threats to privacy may arise. Many 
agree that at some future date, a serious threat may 
develop. That a real danger exists today is not uni- 
versally accepted. 

Let us consider the potential challenge to “privacy” 
that may originate from any of theifollowing sources : 

A. The accidental observance of data by an individual. 

B. The accidental dumping of a volume of confi- 
dential data to general view. 

C. The solitary snoop. 

D. The snoop-for-pay (hired spy). 

E. The file stealer. 

F. Misuse of confidential file by administrator having 
access to system. 

G. Organized crime. 

H. Totalitarian government. 

I. Another possibility might be the intrusion of the 
private sector into government datafiles. 

Establish policy on data classification 

Before any acceptable automation program can 
be developed to process information that may be con- 
sidered “private” or “confidential/ 7 certain policy 
decisions must be resolved. 

A. The responsible administrators representing users 
of the system must reach agreement on the data 
content of the information system . This agree- 
ment must include the identification of any 
data items or files that would be subject to 
restricted access or inquiry. If the restriction 


is pursuant to current policy, said policy should 
be specified : 

1 . General Public Policy 

2 . Agency Administrative Policy 

3 . Statutory Provision 

4. Judicial Ruling 

B. Specific criteria should be established based on 
the accepted policy statements , and serve as a 
guide to test the classification of all data intro- 
duced into the system. The continued validity 
of a classification should be based upon periodic 
challenge and justification. 

C. A policy manual should be prepared and main- 
tained as a ready reference to facilitate system 
operation. 

1 . Personnel participating in the system should 
be held individually accountable for full 
compliance with the “policy guidelines. 77 

2. The policy manual should be subject to 
continuous review and update to remain 
current with system requirements, tech- 
nology, and legal specifications. 

D. Additional considerations in the development of 
an Interagency Information System to main- 
tain privacy control. Decisions regarding the 
following elements of the system design and 
operation will prove significant: 

1 . Facility Security : 

(a) Location of Hardware 

Single vs. Multiple Facility 

(b) Physical Adequacy 

Equipment 

Personnel 

(c) Access to Facility 

Normal 

Emergency 

2. Equipment: 

(a) Selection 

(b) Configuration 

(c) Operating Characteristics 

Multi-processing 
Multi-programming 
Remote Terminals 

3 . Program Control : 

(a) Single Management Responsibility 
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User Representation and Par- 
ticipation 

(b) Operating System 

Monitor of System Services And 
Access 

(c) System Applications 

(d) Man Machine Interface (Key Con- 

sideration) 

(e) Modularization of System Appli- 

cations 

Does Modularization Weaken 
Privacy Control? 

(f) Integration of Compatible Systems 

Does Program Control Reside 
With The Core System? 

4. The Human Factor : 

This is the critical and perhaps most 
unpredictable element in the functioning 
process. 

(a) Personnel Recruitment, Selection 

And Appointment 

(b) Personnel Training And Super- 

vision 

(c) Maintenance of Operating Dis- 

cipline 

(d) Personnel Retention 

Precautions to minimize potential for “privacy” violations 

The same versatility and power that makes the 
computer valuable as a data manipulator can be em- 
ployed to monitor system services and support human 
supervision procedures. The operating information 
system should provide (assuming an adequate system 
analysis and design) : 

A. A Sound Data Classification System 

1. Specify data subject to restricted access 
and special protection. 

2. Provide for isolated storage of restricted 
data if necessary. 

3. Determine who has right to access to 
confidential data and under what operating 
conditions. 

4. User agency personnel should be certi- 
fied for access by administration. 

B . Physical C onditions : 

What levels of control should be imposed to 
promote system integrity and at the same time 
provide a functional environment that will 


encourage system utilization by tfye partici- 
pants for which it was designed. 

1 . Equipment (system hardware ) : 

(a) Location and physical security of 

equipment. 

(1) Central Computer Instal- 

lation 

(2) Associated Peripheral 

Equipment 

(3) Back-Up Facilities — 

Duplicate Files 

(b) Remote terminal installations 

(I/O devices.) 

(c) Circuit Security 

2. System Configuration 

(a) Central Data Bank vs. Dispersed 

Data Bases 

(b) Central Data File vs. Central In- 

dex Concept 

(c) Central System Control vs. Remote 

Terminal Activation 

(1) Restricted Terminal Op- 

eration 

(2) Multiple Function Re- 

mote Terminal 

3. Software System Support — Programming 
must be developed with an awareness of 
the need for system integrity and data 
security. Provision must be made to pro- 
vide control over basic software com- 
ponents, such as : 

(a) Program Library 

(b) Back-Up Documentation 

(c) Diagnostic And Test Routines 

(d) Continuous Coding of Update 

Schedules That Support The 
Identification Schemes Inherent 
to The Confidentiality Control 
Programs 

(e) Transaction Monitor Logs Should 

Be Designed to Provide The 
Basis For Operational Super- 
vision But Not Reveal The 
Location or Content of The Con- 
fidential Files Which Are Subject 
to Monitor Control 
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4. Personnel Requirements — If the system 
equipment and facilities justify particu- 
lar planning to minimize the hazards to 
confidentiality, it is certain that considera- 
tion be given to the personnel who will 
function in the system. The scope of at- 
tention should extend through both the 
employees who perform the technical 
services associated with EDP, and the 
operating personnel of the agency for 
which the information system was de- 
veloped. Despite all that has been said 
heretofore, the “key” to security of in- 
formation rests with the individuals who 
have access to the data system. Our 
personnel planning should encompass 
many specific areas. The following relate 
most directly to physical factors: 

(a) Personal Safety 

(1) Area Accessibility 

(2) Emergency Provisions 

(b) Personal Accountability 

(1) Identification Control 

Plan 

(a) Access to Instal- 

lation 

(b) Access to Specific 

Work Areas 

(2) Is the Plan Practical — 

Used? 

(c) Conveniences And Necessities 

( 1 ) Are They Adequate? 

(2) Are They Properly Lo- 

cated? 

(d) What Special Precautions Are War- 

ranted When Non-employee Per- 
sonnel Are Permitted Access to 
The Installation Area? 

(1) Equipment Maintenance 

(2) Building Service Main- 

tenance 

C. System Design Considerations : 

Control provided through specific program' 
ming techniques. 

1. Limiting Terminal Access to The Sys- 
tem — Programming 


(a) Classification Schedule (Data Level 

Control) 

(1) Terminal Identification 

(2) Terminal V erification 

(3) User Identification 

(4) User Verification 

(5) Call-Back Concept 

(b) Restriction of Detail of Informa- 

tion in Response to Inquiry 
(Data Item Control) 

(1) Refer to Index - Pointer 

to Source Data 

(2) Status Indicator 

(3) Advise Supervisory Sta- 

tion 

(a) Secure Permis- 

sion to Inter- 
rogate The Re- 
stricted File 

(b) Receive Select- 

ed FLesponse 
Through Mon- 
itor Agent 

(4) Specific Limitation on 

Terminal Operation 

(a) Data Input 

(b) Data Manipula- 

tion 

(c) Data Output 

(d) Data Change or 

Update 

(e) Data Purge 

2. Establish A Monitor On All Terminal 
Action to Intercept and Identify un- 
authorized attempts to access the system. 

(a) Identify Transmitting Terminal 

And Location 

(b) Identify Terminal Operator(?) 

(c) Identify Specific Nature of Re- 

stricted Access Attempt 

(d) Provide For Supervisory Level 

Notification of The Attempt to 
Support Maintenance of System 
Discipline 

(e) Abort The Unauthorized At- 

tempt to Secure Data 

3. Maintain audit review of selected files to 
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facilitate the orderly purge of files and to 
check levels of file activity 

(a) Establish, as necessary, periodic 

file review procedures to chal- 
lenge the Continued “confiden- 
tial” status of individual data 
items to assure conformity with 
system policy and user need 

(b) Maintain necessary statistical 

measures of activity in restricted 
files to document operational 
policy decisions. 

(c) Provide special test routines to 

challenge the confidentiality 
procedures and verify system 
functional integrity 

(d) The Human Factor — The concern 

for confidentiality of data and 
file security eventually will fo- 
cus on an assessment of problems 
that arise from the human ele- 
ment in the man-machine sys- 
tem. Despite the sophistication 
exercised in system analysis, de- 
sign and implementation, specific 
recognition must be given to 
the fact that people participate 
in system operations. 

What about a future computer utility ? 4 

With the rapid and diverse growth of computer 
services and recognizing the intimate relation between 
hardware facilities, communication channels and the 
users of the systems, it is no accident that discussion 
should arise about the future establishment of a com- 
puter-communication utility. The need for such a 
service becomes more apparent as we see the intro- 
duction of time-sharing systems and the implemen- 
tation of large integrated data services that support 
major regional and even statewide programs. The 
arguments pro. and con the justification for a com- 
puter-communication utility are beyond the scope of 
this paper. However, the utility concept does provide 
the opportunity to propose several avenues of approach 
to improving the “privacy” control aspect in personal 
data systems. One of the recurring suggestions has 
been to establish a system of certification and licensing 
for persons directly involved with the design, instal- 
lation, management, and the operation of data systems 
containing sensitive personal information. A second 
device that could prove of value would be to effect 


control through regulation of the computer-communi- 
cation utility service. 

CONCLUSION 

The challenge of privacy control 

Violations of standards regarding confidentiality 
or privacy of information occur when particular items 
of personal data furnished to an information system 
for approved selective use are released to unauthorized 
persons or in a manner that jeopardizes expected 
system integrity. 

A. The Predominance of The Human Factor 

The integrity of any information system re- 
garding confidentiality or invasion of privacy 
will eventually be resolved at the level of the 
human factor. Machines, data sets, file cabinets, 
index cards, tape drives, disk files, memory 
modules, computers, report registers — each of 
these devices is an inanimate object devised by 
man to receive, transfer, or hold information 
items made available to the system through 
human intervention. Data stored in these de- 
vices are significant only insofar as the output 
is meaningful to man, and subject to change 
or exposure by the action of an individual. Data 
stored in an inactive or inaccessible device 
without human interaction will not reveal in- 
formation that would provide the basis for a 
violation of privacy. The relationship between 
man and his information system can be de- 
scribed as consisting of the following basic ele- 
ments: 

(1) Man conceives the system. 

(2) Man builds the elements necessary to pro- 

vide the system. 

(3) Man organizes the elements and es- 

tablishes a scheme of operation. 

(4) Man gathers the data that he intro- 

duces into the system. 

(5) Man activates the system. 

(6) Man commands the resources of the 

system. 

(7) Man utilizes the results of the system in 

his external contacts in society. 

The consistent factor in the above summation 
is the predominant relationship of man to the 
system . Man is responsible for creation of 
the system, the input of information, the 
manipulation of that information, and the final 
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disposition of the data produced or revealed by 
the system. 

B . Personnel Standards Are N ecessary 

Due to the prime significance of the human 
element in the integrity of any data automated 
system, the programs must address the fol- 
lowing problems in a forthright manner : 

(1) Personnel standards must be established 

for all participants. 

(2) All accepted personnel must be indoc- 

trinated on a continuing basis regarding 
the system objectives, functions, opera- 
tional responsibility, etc. 

(3) Specific training must be provided re- 

garding system participation and 
terminal operation. 

(4) Each installation should have competent 

supervision and a plan of routine 
inspection of operations. 

(5) Each agency participating in a larger 

shared system must be accountable 
for the performance and integrity 
of its representatives. It must also be 
responsible for the release of any 
system information that is received 
from a classified file. 

(6) All personnel who have access to the sys- 

tem should be required to sign a vol- 
untary statement acknowledging their 
individual responsibility to protect 
the integrity of the system and respect 
the confidentiality of classified data. 
This statement could be a factor in the 
initial as well as continued employ- 
ment. 4 

The operating system must prove convenient 
and satisfactory to the user. It must provide 
an effective service with assurance as to its 
accuracy and adequacy. Outputs should be 
tailored to meet the user need under the cir- 
cumstances of the inquiry. The efficiency of 
the system should discourage any user devel- 
opment or maintenance of alternate or substi- 
tute systems. The man-machine interface should 
be maintained through the use of simple, direct 
devices with a minimum requirement for coding 
progressive verification, etc. An automated 
data system should be so designed and sup- 
ported that the user is free to direct his full 
attention to his prime functional responsibility. 


The information system must be a viable and 
practical tool. It should function at the con- 
venience of the user, with intelligible outputs 
consistent in time and content to satisfy the 
service requirement. Where a system re- 
quires specific security restrictions, these must 
be furnished and function without imposing 
any awkward limitation on the legitimate user 
of the system. 

C. Weak Policy And Discipline Results in An 

Inferior System 

Recent critics have voiced objection to the de- 
velopment of major data banks and interagency 
information sharing systems in government ser- 
vice. Their objection has been based, in part, 
on certain practices associated with private 
credit bureau operations. The lament, properly 
uttered, pointed to a lack of data control and 
exercise of discretion by a number of these 
private agencies. While the economic and social 
value of credit rating bureaus is readily ad- 
mitted, the loose policies regarding “privacy 
of data” casts a shadow regarding the ability 
to maintain integrity in a major information 
system. I believe it is an unfortunate and im- 
proper inference to conclude that public in- 
formation systems cannot protect the “privacy” 
of information due to questionable practices 
among some business organizations established 
to collect and merchandise private informa- 
tion for profit. 

D. Limitation of Data Access of Specific 

Authorization 

Suggestions have been made that an individual 
should specify the extent of utilization of per- 
sonal information and then the system be re- 
quired to conform to the intention expressed 
by the individual. This proposal sounds rea- 
sonable, but on further consideration, presents 
subsequent problems in data management, 
modification of data use authorization, etc., 
that demand thorough study. 

E. Individual Right of Inspection of Record - File 

Correction 

Perhaps one of the most practical approaches 
toward satisfaction of individual “right to pri- 
vacy,” and at the same time facilitate the 
availability of the maximum of information re- 
sources to solve social needs is to make pro- 
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vision so that the individual can inspect the 
system files that contain his personal data. 
The individual should also have means to seek 
correction of any data item that is in error and 
subject to bias interpretation. 

F. Develop Realistic Data Purge Policy 

Attention should be given to the development 
of basic guidelines regarding the longevity of 
data resident in a file or information system. 
The current trend is to collect and classify 
more and more data on more and more people. 
While hopefully most of the data will have 
social value, I am sure that a significant quan- 
tity will provide little benefit to the individual 
or the community. It is not too early to con- 
sider the need for sound purge criteria so that 
the data retained in an operating system will 
offer the highest potential return for the energy 
expended. 

G. Adequate Training Programs Must Be Developed 

And Employed For The EDP Staff And Perso- 
nnel of The User Agency Who Have Occasion 
to Engage The Data System 

The content should include an introduction to 
system design concepts, the overall functions 
and data processing applications that are com- 
ponents of the system and a thorough instruc- 
tion in terminal man-machine dialog. In ad- 
dition, some attention should be given to ex- 
plaining the service philosophy with particu- 
lar attention to the rules regarding access to 
and utilization of any information from confi- 
dential or restricted files. The legal and moral 
issues must be clearly defined, and an under- 
standing accepted by all who engage the system 
that a violation of the security code regarding 


restricted data may be sufficient grounds for 
removal from system participation or dismissal. 

The training program must be viewed as a con- 
tinuing support function with periodic refresher 
classes, problem sessions, review of privacy 
criteria, etc. It is most important that the 
agency administrators and key supervisory 
personnel become involved in this program, and 
not leave the system discipline task to the tech- 
nical staff who are not equipped nor responsible 
for this duty. 

H. Despite much uncertainty and misgivings as to 
the effectiveness in terms of “ privacy ” control 
that will result from the imposition of a licensing 
scheme , such a potential mechanism will he the 
subject of more intense consideration with the 
passage of time. 
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Model of translator system 


Our model of a programming-language translator 
system is represented schematically in the block dia- 
gram of Figure 1. This diagram divides the translator 
system into two components. The first component T is 
a translator program that reads in and translates the 
valid programs of some programming language L. 
The output of the translator is a subset T(L) of the 
intermediate language. The second component is a 
system M for executing the programs translated into 
the intermediate language. It will be seen that, in this 
intermediate language, the operators follow their 
operands in postfix (reverse polish) form, and they are 
relatively machine independent. In this paper, w^e will 
be mainly concerned with defining the operation of 
the translator component by specifying the' input- 
output relationships of the translator for a particular 
programming language. These relationships will be 
described in a syntactic notation that is independent 
of the particular translation algorithm used for imple- 
menting the translator T. 

The language that was chosen as an example for this 
paper is Wirth and Weber’s EULER. 14 EULER is 
quite similar to ALGOL 60 in appearance and capa- 
bilities, and it has additional features found in the 
LISP list-processing language. The original EULER 


M(T(L) ) ^ 
Output R?sults 

4 

Input Data 

Figure 1 — Simplified block diagram of a translator 
system 

syntax was written to conform to the requirements of 
a precedence translation algorithm, 14 and contains a 
number of syntactic rules whose purpose is to facilitate 
construction of a precedence translator from these rules. 
Because of the presence of these stylized rules, it was 
decided to rewrite the EULER grammar into a more 
compact and transparent form than the one in which 
it originally appeared. An Irons-style notation 2 - 3 was 
used to specify the translation of this new EULER 
grammar. 

Reverse Polish translation of programming languages 

To illustrate what we mean by a syntactic specifi- 
cation of a programming-language translator, let us 
consider as an example the following small portion of 
the EULER syntax and examine some of the basic 
devices used by our EULER system: 



Translator 

Translated 

Executing 

Input Programs 

Program 

Programs 

Program 

in Language L 

T 

T(L) * 
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Grammar 1. A Simplified Subset of EULER 


Syntactic Rule 


Rule of Translation 


(expr ) — > (var ) = (expr ) 

| (sum } 

(sum)— » (sum) + (term) 

| (term) 

(term ) — » (term ) * (factor ) 

| (factor) 

(factor) — > ((sum)) 

| at (var) 

I (var ) 

| (var ), ( (expr-sequence )). 
(var ) — > (name ) 

(expr-sequence ) — > (expr ) 

| (expr-sequence ), (expr ) 


(var ) (expr ) assign 

I 

(sum ) (term )add 
I 

(term ) (factor )multiply 
I 

(sum) 

(var) 

(var )in 

(expr-sequence ) (var )in 
variable (name) 

I 

(expr-sequence ) (expr ) 


Note that the rules of translation above refer to 
sequences of symbols on the right parts of syntactic 
rules. In this example, we see that the rules of trans- 
lation specify how symbols and sequences of symbols in 
the source language are rearranged and rewritten in the 
translated language. Where no change at all is indicated 
in the translation of a particular rule, the symbol 
“I” appears as a translation rule. As an example of how 
sequences of symbols are rearranged for translation, the 
infix addition of 

<sum> + <term> 

is translated into the reverse polish sequence of symbols 
consisting of a “^sum^” followed by a “ <term > ” 
followed by the intermediate-language command for 
adding together the values resulting from evaluation 
of the previous two subexpressions. As in good polish 
notation, parenthesis are removed from around ex- 
pressions, and this process is specified by associating 
the translation rule “<sum>” with the syntactic rule 

< factor > — > ( < sum > ) . 

The remaining rules having < factor > on the lefthand 
side are used for translating arithmetic operands into 
the intermediate language. For example, the syntactic 
rule 

< factor > — ■» < var > 

indicates that operands in arithmetic expressions are 
variable names, and the translation of a <var> into 
the sequence 


<var> in 

indicates that the “in” command is used for fetching 
the value associated with <var> and for storing that 
value on top of the run-time operand stack of system 
M. 

The other syntactic rule 

< factor >— >at < var> 

reflects the fact the EULER permits use of program 
variables that are pointers to data named by other 
program variables. Hence, the effect of the “at” com- 
mand of the source language is to suppress the appear- 
ance of “in” in the translated program after the trans- 
lated variable name. In this case, a pointer to the data 
stored in <var> is left on top of the operand stack in 
system M at run time. Finally, the rule 

< var > — ► < name > 

means that the names of program variables are trans- 
lated into the sequence “variable < name > .” Here, the 
effect of the “ variable ” command is to find a pointer to 
the data stored in the following name by system M and 
to place this pointer on top of the run-time operand 
stack. 

The sequence “< var >.(< expr-sequence >).” on 
the right part of the remaining < factor > rule is the 
definition of an EULER function call. Function 
calls are translated with the parameters preceding the 
function name in the translated program. In this way, 
the function call can be made to look like a reverse 
polish operator having n operands, with n the number of 
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parameters. A parameterless function call is translated 
exactly the same way as a program variable. Thus, 
the sequence 

“variable <name> in” 

in a translated program serves both to fetch data and 
to initiate a call on a function, depending on the 
<name> involved. This calling sequence will be 
referred to in the following discussion of extendible 
language features. 

In the full translation grammar for EULER given 
in Appendix 2, it is possible to see how the methods 
presented in the preceding example are applied to the 
specification of a complete programming language. 
Note that this larger grammar uses, e.g., the symbol 
“+” in place of the “add” instruction of our small 
example, and, in general, translates as many source- 
language symbols as possible directly into commands 
of the intermediate language. The description of EULER 
programming given in Appendix 1 of this paper should 
clarify the meaning of the EULER operators used, 
and the following section in this paper will discuss the 
syntactic methods for optimizing and extending 
EULER as they are developed in the EULER gram- 
mar. A full description of the intermediate reverse- 
polish language specified by the EULER rules of 
translation can be found in Schneider. 10 

Syntactic methods of optimizing expressions 

In the EULER grammar of Appendix 2, the rules of 
translation specify that a conditional statement or 
expression of the form 

“IF <expr> 1 THEN <expr> 2 ELSE<-expr> 8 ” 

is translated into its intermediate language version in 
the form 

“<expr> 1 $IF <expr> 2 $THEN <expr> 3 $ELSE” 

Note that each of the expressions here can themselves 
contain conditional expressions of any desired degree 
of nesting, and each of the subexpressions will be re- 
arranged as shown above. In this intermediate language 


the “$IF” command causes an interpretive scan to 
the matching “$THEN” label if <expr> 1 is false. 
Otherwise execution continues until a “$THEN” is 
reached, at which point a scan occurs to the “$ELSE” 
label that matches this “$THEN”. In this way, 
“$THEN” and “$ELSE” behave like balanced pa- 
rentheses around expressions, and also serve as place- 
markers to which control can be transferred in the 
translated program. 

This mechanism for executing translated cond tional 
expressions is used also as the basis for translating 
logical expressions into a partially optimized form. 
To take an example, the EULER sequence correspond- 
ing to a disjunction is represented by 

“<disj> OR <conj>’\ 

Its translated form is 

“<disj> $IF $TRUE $THEN <conj> $ELSE M . 

Here A if the first operand “<disj>” of the expression 
is true, the entire expression is true. Therefore, the 
second operand is evaluated only if the first operand 
is false. A similar mechanism is used for the sequence 

“<conj> AND <neg>”. 

Here, if the first operand is false, the second operand 
need not be evaluated. Hence, the translated con- 
junction is of the form 

“<conj> $IF <neg> $THEN $FALSE $ELSE.” 

Same syntactic methods of extending EULER 

After developing the appropriate techniques for 
translating conditional expressions and for optimizing 
logical expressions, the next order of business is to 
use these syntactic tricks to provide extended facilities 
in the EULER language. The introduction of full 
string-processing facilities into the EULER system is 
the first example to be considered. Without altering 
the EULER interpreter, and with a little reprogram- 
ming of the translator, we can effect the following 
improvement: 


Syntactic Rule Rule of Translation 

(prim) — > (stringprim) I 

(stringprim ) — > (stringhead ) ' (stringhead )) . 

(striughead ) — > ' 

| (stringhead ) (symbol ) (stringhead ). * (symbol ), 
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Here, a string of arbitrary length is translated into a 
list whose cells store the symbols in the string one 
symbol in the cell in sequence. With this arrangement, 
It is possible to manipulate strings using the list con- 
catenation operator provided by EULER, and using 
EULER subroutines to perform tests for list equality 
and containment. 

The second example involves the addition of facili- 


ties for reading in data at run time within the frame- 
work of the EULER system. In this case, additional 
facilities must be provided in the EULER polish string 
interpreter. These facilities take the form of routines 
for converting numbers into their internal representa- 
tion and for packing string data. The added syntax 
consists of the following set of rules : 


Syntactic Rule Rule of Translation 

(program) -> .ENTRY (block ).EXIT. (block) 

| .ENTRY (data)., (block) .EXIT. (data ) (block ) 


(data ) — > (datahead ) END 
(datahead ) — > DATA (item ) 
| (datahead )., (item ) 

(item ) — > (number ) 

| (stringprim ) 

| (datalist ) 

(datalist) —>.(). 

| (datalisthead ) (item )). 
(datalisthead ) — > .( 

| (datalisthead ) (item ), 

With this program structure, the data portion could 
be read in by a run-time subroutine that leaves the 
data in a pre-arranged location of memory. The 
interpreter routine could then be read in over the data 
routine, and the translated program would be executed. 
A statement of the form “READ <prim>” would 
then store an appropriate link to some segment of 
the read-in <data> on top of the run-time operand 
stack. 

The third example involves the use of a syntactic 
notation to expand the EULER language into a self- 
extendible programming language similar to MAD/1 
(4) and ALGOL 68 (11). By an extendible programming 
language, people currently mean the following two 
things. 

a. A language in which the programmer can specify 
new data types and data structures composed 
of novel configurations of data elements. 

b. A language in which the programmer is able to 
reorder the priorities of expression operators and 
is able to specify arbitrary new operations at 
will. 

In EULER, there already exists a general mechanism 
for allowing programmers to manipulate data structures, 
namely, the list mechanism. EULER lists can be 
constructed from arbitrary combinations of data 


I 

SDATA (item) 

I 

I 

I 

I 

I 

I 

I 

I 

elements. However, EULER only has eight data types 
with no facilities for extending their ranges. Such range- 
extension facilities depend on the machine on which 
the language is implemented, and algorithms for speci- 
fying such data types as numbers of arbitral precision 
must be written for the machine in question. Hence, 
our example will concentrate on the machine-independent 
problem of specifying new operators in programs. 

Any reasonable programming language must pre- 
suppose the existence of a standard set of expression 
operators before provision is made for allowing pro- 
grams to expand this set of operators. With each 
standard operator will be associated a standard pre- 
cedence level, and the operators to be introduced by 
the programmer must also have precedence levels. As 
the term is currently used, operator precedence (or 
priority) is a measure of how expression operators 
compare in binding power. For example, exponentiation 
is said to have lower precedence than addition, because 
exponentiation is performed before addition in 
arithmetic expressions. Thus, precedence imposes an 
ordering on the operations of a language. This ordering 
is reflected in the ordering of syntax rules in program- 
ming language grammars. In the EULER grammar 
above, rules are ordered so that list concatenation is 
performed first, then exponentiation, and so on, until 
the operation of value assignment. From concatenation 
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to assignment of value there are nine levels of prece- 
dence. 

Gur approach in providing for the programming of 
new operations is to assign these operations to one of 
nine classes of operators, reflecting the nine levels in 
original grammar. This means that the translator must 
now treat operators as though they are procedure calls 
that can only be written into the translated program 

Grammar 2. 

Syntactic Rule 

(expr ) — ► (var ) (opname ) (expr ) 

I (disj) 

(disj ) — » (disj ) (opname ) (conj ) 

I (conj ) 

(conj ) — ► (conj ) (opname ) (neg ) 
l(neg) 


(catena ) — > (catena ) (opname ) (prim ) 
| (prim) 


(blockhead ) (blockhead ) 
(operatordec)., 

(operatordec ) — > OPERATOR 
(opname ) 

| (operatordec ), (opname) 


where their associated precedence level permits their 
operations to occur. In order to permit the programmer 
to tell the translator what precedence is associated with 
a newly defined operator, we require an additional 
operator declaration in our language. This declaration, 
together with the precedence syntax of expressions 
that follows, is sufficient to provide the expanded 
operator-definition facility 


Rule of Translation 

(var) (expr) JVARBL (opname) JIN 

I 

(disj ) (conj ) JVARBL (opname) JIN 

I 

(conj ) (neg ) SVARBL (opname ) SIN 

I 


(catena) (prim) SVARBL (opname) SIN 

I 


(blockhead ) (operatordec ) 
$NEW (opname) 

(operatordec ) SNE W (opname ) 


An Expression Grammar for Defining New Operators 


(expr)—> (opname) = (opdef) 
(opdef ) (defhead ) (expr) J. 
(defhead)— ► (rankpart) 
(operandpart)., 

(rankpart ) — » RANK OF (digit)., 


(opname ) (opdef ) = 

I 

(rankpart ) (operandpart ) 
(Not Translated) 


(operandpart) — » OPERANDS (name) SFORMA (name) 

| (operandpart ), (name ) SFORMA (name ) (operandpart ) 

(opname ) — » (symbol ) I 

| (opname ) (symbol ) I 


In the expression syntax above, the < opname > 
in each rule is translated into a procedure call, with 
parameters consisting of the one or more operands 
associated with each < opname >. These procedure 
calls either refer to the “Standard” operator associated 


with a particular precedence level or refer to the trans- 
lated < opdef > declared by the programmer. It is 
assumed that the translator will automatically enclose 
each translated program with an extra outer block 
containing procedure definitions for the set of standard 
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operators basic to the language. In this way, the 
standard operators can be redefined within a particular 
program, but will regain their usual meaning upon exit 
from the block in which the redefining statement 
occurred. A consequence of this method of allowing 
new operator definitions is that program subroutines 
may use operators global to their definitions, but may 
not have operators passed to them as parameters, 
since all assignment of precedence is performed at 
translation time. 

A certain amount of optimization is still possible 
within the framework of this extendible translator. As 
an example, suppose that we write the following pro- 
cedure correspond to the standard operator for logical 
conjunction: 

AND = RANK OF 7., OPERANDS X, Y., IF Y 
THEN X ELSE FALSE $. 

The actual parameters in the procedure call for logical 
AND above are expressions surrounded by and 
Thus, the effect of the conditional expression in 
the operator definition given above is to evaluate the 
Y parameter only once and not to evaluate the X 
parameter unless Y is true. 


Programmer-defined syntactic augments to existing 

languages 

As a next step in allowing programmers to decide on 
the nature of their own programming languages, we 
could conceive of a translator facility for allowing 
programmer-specified syntactic and semantic augments 
to existing programming languages. The idea behind 
this definitional facility is that the translator can be 
provided with facilities for accepting new syntactic 
rules and associating their right parts with rules of 
translation that are essentially calls on global pro- 
cedures. The operands within the new syntactic aug- 
ments are than translated as parameters supplied to 
the procedures for executing the augments. The 
feasibility of such augments, provided they do not 
lead to problems of syntactic ambiguity, can be inferred 
from the algorithms presented in Schneider. 9 - 10 

As an example of what a programmer might be 
tempted to add to his language, and of the methods he 
could use, we consider the problem of adding ALGOL 
W-style iteration to the EULER language. In the 
following translation grammar, the global procedures 
used in translated programs are “$FOR” and 
“$ WHILE”, corresponding to the incremented vari- 
able and logical iterations, respectively. 


Grammar 8. A Programmer-Defined Syntax of Iterative Statements 
Syntactic Rule 

(a) (expr) —* WHILE (expr) 1 DO (expr) 2 

(b) (expr) -> FOR (var) FROM (expr) 1 UNTIL (expr) 2 BY (expr)* DO 
(expr) 4 

Rule of Translation 

(a) J(expr) 1 $..$(expr) 2 SJVARBL SWHILE SIN 

(b) (var) (expr) 1 (expr) 2 (expr) 3 J(expr) 4 SJVARBL SFOR $IN 

Note that the controlled statement in the syntax 
above is translated with procedure definition brackets 
“ J.”and“S.”. In this way whenever the corresponding 
formal parameter in the “SFOR” OR “SWHILE” 
procedure definition is executed, the entire controlled 
statement is executed as a procedure. The procedure 
definitions of “SFOR” and ‘WHILE” that follows 
are the “semantics” of Grammar 3: 

SFOR = JFORMAL VAR, EXP1, EXP2, 

EXP3, STAT., 

BEGIN LABEL TEST, CYCLE., 


VAR = EXP1., GO TO TEST.,, 

CYCLE.. VAR = VAR+EXP2., 

TEST.. IF(VAR— EXP3)*SIGN(EXP2)GT O 
THEN UNDEFINED 
ELSE BEGIN STAT., GO TO CYCLE 
ENDS. 

SWHILE = JFORMAL LOGEXP, STAT. 
BEGIN LABEL CYCLE., 

CYCLE.. IF LOGEXP THEN BEGINST AT, 
GOTO CYCLE END 
ELSE UNDEFINED END $. 
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TO INITIAL POINT FOP <PRIM) 



Figure 2— A portion of the EULER translator 


How a section of the translator was designed — an example 

It is assumed that readers of this section will have 
some familiarity with the translator example in the 
previous paper 9 on this subject. In order to simplify 
the programming of the translator, it was decided to 
have the reserved words of the language perform as 
many functions as possible in the translation. Thus, 
the reserved words actually appear in translations as 
commands for the interpretive system where appro- 
priate, and are stored on the pushdown store of the 
translator in place of the i ‘nonterminal symbols” of 
the normal-form version of the grammar. For example, 
in the normal-form grammar for EULER, the rule 
for a conditional expression is 

< expr > — » Xi < alternative > 

Xi — » X 2 < consequence > 

X 2 — » <condition> 

By letting X 1 be THEN and X 2 be IF in the translator, 
the coding is greatly simplified, and no ambiguities 
are introduced, since the X; can be treated as “new 
and distinct” symbols of the normal-form grammar. 


The flowchart of Figure 2, showing the transitions to 
and from the box corresponding to < expr < , illustrates 
how the EULER translator was programmed. 
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Appendix I 

Features of the EULER language 

EULER is a nested block-structure language, 
similar to ALGOL. Thus, every block, consisting of a 
sequence of statements surrounded by BEGIN and 
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END parentheses, can be treated as a single statement 
in ALGOL fashion. An EULER program consists of 
an EULER block preceded by .ENTRY and followed 
by .EXIT., 

In EULER^ there are three declarations. One decla- 
ration is for data variables, one for program labels, 
and one for formal parameters of procedures. In the 
program 

“ENTRY BEGIN NEW X, Y., 

LABEL Z., ... 

Z..X + YEND .EXIT.” 


X and Y will store data, and Z will be a label pre- 
ceding some statement. 

Assigning a data type to a declared variable is 
accomplished by writing an assignment statement with 
data of the appropriate type on the right-hand side 
of the assignment. Thus, typing of variables in EiULER 
is dynamic, since any assignment statement can change 
the data type stored in a variable. And, data typing 
is implicit, since there are no declarations like real, 
integer, etc., as appear in ALGOL. The following is a 
list of the right EULER data types : 


I. Number — In the EULER system, all numbers are assumed to be floating 
point numbers. The assignment statement 

“V = E.” 

with E a numerical expression or number, causes variable Y 
to become a numerical variable. 


II. Symbol — In this EULER implementation, an assignment statement 
such as 


“V = .*ALPHAN.,” 

causes the six characters “ALPHAN” to be stored in the 
location named by variable V. 


III. Logical — The logical constants are TRUE and FALSE, standing 

respectively for logical truth and falsehood. The assignment 
statement, 

“V = L.,” 

with L a logical constant or logical expression, causes variable 
V to become a logical variable. 

IV. Label — EULER programs use two declarations. “NEW” is used to 

declare a data variable, and “LABEL” is used to declare the 
presence of a label in some block of a program. Interestingly, 
if V is a variable in some EULER block, and V is not in a 
block global to the block of label L, then the assignment 
statement 


“V = L.,” 

causes V henceforth to be of type label, and to be inter- 
changeable with L in GO TO statements. 


V. Reference — In EULER, if VI is a variable not in a block global to the 
block of variable V2, then the assignment statement 


“VI = AT V2.,” 
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makes VI a pointer to the data stored in V2. After VI is 
turned into such a pointer, the two statements 

“V2 - V2 + 1.,” 

and “VI IN = VI IN + 1.,” 

will have exactly the same effect of manipulating whatever 
data is stored in V2. 

VI, Procedure — An assignment statement of the form 

“VI - .$ (expr) 

causes VI to become the name of a parameterless procedure 
call with body given by (expr). As a programming example, 
we might consider the following EULER block: “BEGIN 
NEW X, Y., X = 2., 

Y = .(FORMAL Z., X = X + Z*.., 

OUT Y.(5). END” 

When Y.(5). is operated on by the “OUT” operator, the 
value 7.0000 will be written out. 

VII. List — In EULER, lists can be constructed in three distinct ways: 

(a) On command: “VI = LIST 300.,” 

This statement creates a list of 300 undefined cells and makes 
VI their name. 

(b) By explicit notation : “V2 = .(1,.(2, 3)., 4)..,” 

This statement creates a list consisting of two numbers and a 
sublist and makes V2 the name of that list. 

(c) By concatenation: “VI = VI CON CAT V2.,” Using the 
CONCATenation operator, small lists can be joined into 
larger ones. 

In addition, lists can be subscripted in the same way as 
ALGOL arrays, each element of a list can be any EULER 
data type, including label, reference, and procedure. The 
following EULER block is a small example of the generality 
of the list notation: “BEGIN NEW X, Y., LABEL Z., 

Y = .(2, .$ BEGIN X = X + 1., Y(X) END $., 

.$ OUTXl, Z).., 

X = Y(l)., Y(X)., GOTO Y(4)., 

Z.. OUT .^FINISH END” 

With this program segment, first 3.0000, then FINISH will 
be written out by the executed program. 

VIII. Undefined — Every variable declared by “NEW” in an EULER program 
is initially of type “UNDEFINED.” In addition, “UN- 
DEFINED” is used as a data constant occasionally and as an 
empty option in conditional statements such as : 

“V = IF LI THEN .(1, 5). ELSE UNDEFINED.,” 

For more details on EULER programming, the reader is referred to the Wirth and 
Weber EULER paper. 14 
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Appendix 2 

A new translation grammar for EULER 


Syntactic Rule 


Rule of Translation 


1 : 

2: 

3: 

4: 

5: 

6 : 

7: 

8: 

9: 

10 : 

11 : 

12: 

13: 

14: 

15: 

16: 

17: 

18: 

19: 

20 : 

21: 

22 : 


(program) .ENTRY (block) .EXIT. 

(block) 

(block ) — > (blockhead ) (body ) END 

(blockhead ) (body ) SEND 

(blockhead ) — ► BEGIN 

SBEGIN 

| (blockhead ) (labeldec )., 

(blockhead ) (labeldec ) 

| (blockhead ) (vardec )., 

(blockhead ) (vardec ) 

(vardec ) — ■» NEW (name ) 

SNEW name 

| (vardec ), (name ) 

(vardec ) SNEW (name ) 

(labeldec ) — » LABEL (name ) 

SLABEL (name) 

| (labeldec ), (name ) 

(labeldec ) SLABEL (name ) 

(body)— ► (body)., (stat) 

I 

| (stat ) 

I 

(stat ) — > (labdef ) (stat ) 

I 

|(expr) 

I 

(labdef) — > (name).. 

$LBDF (name) 

(expr ) GO TO (expr ) 

(expr) SGOTO 

|OUT (expr) 

(expr) SOUT 

j(var) = (expr) 

(var)(expr) = 

I (disj ) 

I 

| (condition ) (consequence ) (alternative ) I 

(condition) — » IF (expr) 

(expr) $IF 

(consequence ) — » THEN (expr ) 

(expr) STHEN 

(alternative) —* ELSE (expr) 

(expr) $ELSE 

(disj ) -> (conj ) 

I 

| (disj) OK (conj) 

(disj ) $IF $TRUE $THEN 
(conj ) 8ELSE 

(conj ) — * (neg ) 

I 

| (conj) AND (neg) 

(conj ) $IF (neg) STHEN 
IFALSE 8ELSE 

(neg) — > (relation) 

I 

|NOT (relation) 

(relation) $NOT 

(relation)— > (sum) 

I 

| (sum )' (relop ) (sum ) 2 

(sum )! (sum ) 2 (relop ) 

(relop) -» EQ|NEQ[GEQ 

$EQ|$NEQ|$GEQ 

|LEQ|GT|LT 

|$LEQ|$GT|$LT 

(sum) — ► (term) 

I 

|+ (term) 

(term ) 

| — (term) 

(term) $NEG 

j(sum){ + | — } (term) 

(sum) (term) { + |— } 

(term ) — > (factor ) 

I 

| (term)} *|/|./ . 

(term ) (factor ) { * | / 1 ./ . | 

IMODULO} (factor) 

SMODUL} 

(factor ) — > (catena ) 

I 

| (factor )** (catena ) 

(factor ) (catena )** 

(catena ) — (prim ) 

I 

| (catena) CONCAT (prim) 

(catena ) (prim ) SCONCA 

(prim) -> UNDEFINED 

SUNDEF 
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Syntactic Rule 

Rule of Translation 

|<var) 

(var) $IN 

j (label ) 

(label) SIN 

| ( (expr )) 

(expr ) 

| (block) 

I 

| (procdef ) 

I 

| (referenceprim ) 

I 

j (listprim ) 

I 

| (numberprim ) 

I 

j (logicalprim ) 

I 

|TAIL (prim) 

(prim) STAIL 

j (var ) . ( (expr-sequence )) . 

(expr-sequence ) (var ) SIN 

| (symbolprim ) 

I 

23: (label) — ► (name) 

8VARBL (name) 

24: (var ) — > (name ) 

IVARBL (name) 

| (var) IN 

(var) SIN 

| (var) ( (sum-sequence)) 

(var ) (sum-sequence )) 

25: (expr-sequence) — > (expr) 

I 

| (expr-sequence ), (expr ) 

(expr-sequence ) (expr ) 

26: (sum-sequence) — » (sum) 

I 

| (sum-sequence ), (sum ) 

(sum-sequence )) (sum ) 

27 : (referenceprim ) — » AT (var ) 

(var) 

28: (listprim) — * (list) 

I 

|LIST (sum ) 

(sum ) SLIST 

29: (list) — » .( ). 

I 

| (listhead)(expr)). 

I 

30: (listhead) — > .( 

I 

31 : (numberprim ) — * (number ) 

SNUMBR (number) 

|REAL (disj ) 

(disj ) SREAL 

| LENGTH (catena) 

(catena) SLENGT 

| ABSOLUTE (sum) 

(sum) SABSOL 

IINTEGER (sum) 

(sum) SINTEG 

32 : (logicalprim ) — > TRUE 

STRUE 

|FALSE 

SFALSE 

|LOGICAL (sum) 

(sum ) SLOGIC 

| (sypeinquiry ) (var ) 

(var ) (typeinquiry ) 

33 : (typeinquiry ) — > ISNU 

SISNU SISLO | SISLA 

|ISLO|ISLA|ISLI 

SISLI | SISPR | SISRE 

|ISPR|ISRE|ISSY|ISUN 

| SISSY |SISUN 

34 : (symbolprim ) — ► .* (6-symbol string ) 

I 

35 : (procdef ) — > (prochead ) (expr ) $. 

I 

36: (prochead) .$ 

.$ — 

| (prochead ) (formaldec )., 

(prochead ) (formaldec ) 

37 : (formaldec ) — * FORMAL (name ) 

SFORMA (name) 

(formaldec ), (name ) 

SFORMA (name ) (formaldec ) 

38 : (6-symbolstring ) 

I 

{ (letter )| (digit ) (blank ) 

U-IWIH+I- 

(i.e., a string of 6 characters.) 

39 : (name ) — » (letter ) 

I 
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| 

Syntactic Rule 

Rule of Translation 

| (name ) (letter ) 

i 

| (name ) (digit ) 

i 

(For the IBM 7094 and the UNIVAC 1108, only the first six characters of a 
(name) are translated.) 

40 : (number ) — > (integer ) 

Converted to octal. 

| (integer >. (integer ) 

Converted to octal floating point. 

41 : (integer ) — » (digit ) 

— 

| (integer) (digit) 

— 

42: (digit) —* 0|1| ... 9 

i 

43. (letter) — > A ... [Z 

i 
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INTRODUCTION 

The subject of this paper is a general syntax directed 
macro preprocessor system. One of the suggested po- 
tential uses of this system is that of evaluating new or 
extended programming languages by the technique of 
syntax directed macros. This led to the association of 
the acronym SYMPLE (SYntax Macro Preprocessor 
for Language Evaluations) with this system. 

A preprocessor is a processor intended to be used prior 
to another processing stage. In our case, it is assumed 
that the SYMPLE preprocessor system will generally 
be used in processing higher level language texts (ones 
which are user oriented), producing output text in the 
same or a similar higher level language. 

The term “macro” is used in a very general sense in 
this paper. As in other macro systems, the macro mecha- 
nism consists of the recognition of a macro “reference” 
in the source text being processed, and a macro “defi- 
nition” defining a translation procedure invoked by 
some corresponding macro reference. 

A SYMPLE macro definition consists of two parts: 


the “macro semantic portion” or “macro body , and 
the “macro templates.” 

The macro semantic portion is the translation pro- 
cedure and consists of the instructions to be exe- 
cuted when the macro is “invoked”. A macro is 
invoked when a pattern described in one of its 
macro templates is recognized by the parser in 
the source input text. This macro reference pattern 
may have identifiable parts which are then consid- 
ered as arguments for the semantic portion. 

A macro template defines a possible macro ref- 
erence pattern for this macro and consists of two 
distinct parts: A specification of a general syn- 
tactic substructure of the source input text in which 
a given macro reference may occur (i.e., context), 
and any necessary further syntactic qualifications 
within that general syntactic substructure (e.g., a 
specific pattern). The actual pattern matching 
technique for macro reference is thus a two leve 
syntax directed matching procedure. This sj ntax 
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directed macro reference technique is the method 
by which SYMPLE achieves both simplicity and 
generality. 

The SYMPLE system as a macro system is not tied 
to any particular programming language. The base 
(source input) language and the object (output) lan- 
guage of the macro facility could in fact be entirely 
different languages. 

The syntax of the languages to be processed and/ 
or extended must be adequately described through the 
syntax description metalanguage of the SYMPLE 
system. This syntactic description is used for determin- 
ing “context” for macro references and thus the require- 
ments for a minimally “adequate” syntactic description 
of a language are proportional to the degree of context 
required to isolate macro references. 

As a very simple example, assume all macro references 
must occur in only a single specific syntactic unit (syn- 
tactic substructure) of the base language (e.g., only 
labels of Fortran statements). Then to facilitate the 
recognition of macro references in the source language, 
the syntax of the base language need only be described 
via the metalanguage to the extent that it can isolate 
this syntactic unit type (i.e., Fortran labels.) When 
recognized, this syntactic unit will then be considered 
as a candidate for containing a macro reference. 

After a candidate syntactic unit is isolated in the 
source input a check can be made for the existence of 
specific macro references by testing for further quali- 
fying patterns w r ithin that syntactic unit. For instance, 
a Fortran label of “three blanks followed by two num- 
bers” might be a specific macro reference. A check would 
thus he made for this reference according to the syn- 
tactic pattern defining “three blanks followed by two 
numbers” whenever a Fortran label is recognized. This 
process of local syntax investigation is called “template 
matching” for a macro reference. 

It is also through the template matching facility 
that translation parameters in the source language 
(e.g., arguments, conditions, etc.) are recognized and 
passed to the actual macro facility. These translation 
parameters, which we shall call argument strings, can 
be manipulated by the instructions contained in the 
body of the macro (semantic portion). 

Since the primary function of the SYMPLE system 
is that of a preprocessor, the translation process is main- 
ly that of a manipulation of argument strings and the 
insertion of modified and/or created strings back into 
the source input. Hence, the actual semantic portion 
of the macro is implemented in a language oriented to 
the manipulation of character strings. Thus translation 
due to macro references and related translation param- 



Figure 1 — A general flow of the SYMPLE macro 
preprocessor system 


eters generally results in the insertion of the transla- 
tion code in the base language into the body of the 
code being processed. It will be shown that this “in 
place” translation in the SYMPLE system does not 
necessarily imply expansion in exactly the same place 
(i.e., at the lexicographical location of the macro 
reference). 

An attempt will now be made to summarize and 
interrelate the functions of the SYMPLE system by 
outlining the system functional flow via a system flow 
diagram (Figure 1) and the following brief description. 

The preprocessor operates as follows : 

1. The first items processed contain control infor- 
mation which includes such items as the device (s) 
from which subsequent information is to be read, 
the device(s) designed for system output, the 
names of special edit macros, specific listing 
options, etc. Control information may occur 
in the input stream at other logical stages of 
processing. 

2. A description of the base language syntactic 
structure is read as input and processed to 
build a data base for the recognition portion. 
This data base will be used later by a parser. 

3. Macros (templates and associated semantic 
translation routines) are read in, stored, and 
used to create necessary data bases for later 
processing. 

4. A source deck is read in and parsing of the 
source input begins. (Probable entry point for 
most users.) 

a. As a syntactic unit is recognized, a check 
is made to see if any macros have templates 
to be matched in this syntactic unit. 
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Templates of edit macros, if any, are tested 
last. When there are no templates left to 
be checked and if the end of the total 
parse has not been encountered, the parse 
is continued. 

b. If a macro template match is successful, 
the argument strings are passed to its 
associated macro semantic portion. There 
may be any number of macro templates 
associated with a given macro semantic 
portion, and identical template patterns 
can be associated with different macro 
semantic portions. 

c. The instructions in the current macro 
semantic portion are executed (actually 
interpreted) and the results of their opera- 
tions are effected (e.g., storage manipu- 
lation, insertion of translation into input 
source, dynamic creation of new macro 
templates or semantics for this or other 
macros). Upon completion of execution 
control is returned to 4a above. 

5. When the source deck has been completely 
parsed and thus source time translations, in- 
cluding any necessary editing, have been com- 
pleted, the file is then ready for output in a 
manner specified by the control information. 

6. Processing is now completed, but by appro- 
priate control information another cycle may 
be initiated on (a) new information or (b) on 
a previous preprocessor output file. Thus, in the 
latter case, we have the possibility of a multi- 
pass preprocessor, if desired. 

The remainder of this paper will be devoted in the 
main to the details of what the SYMPLE system can 
do and in general how one goes about using the SYM- 
PLE system. The syntax description metalanguage is 
introduced first followed by an introduction to the 
macro translation (semantic) and insertion capabilities 
of SYMPLE. 

Syntax description metalanguage 

The syntax description metalanguage is used to de- 
scribe a parsing “grammar” of the base language in 
which macro references are to be embedded and thereby 
outline the manner in which the source input is to be 
parsed. For example, suppose a label field is one syn- 
tactic structure to be parsed. The parser should then be 
told that a label field consists of, say, five characters 
which are either all digits, all blanks, or a string of 
blanks followed by a string of digits. 


The grammatical metalanguage used to direct 
SYMPLE^ parser is similar to the Backus-Naur 
Form 4 (BNF) metalanguage. For example, similar 
grammatical productions are used to define syntactic 
structures; the nonterminals and terminals of BNF are 
also used being renamed syntactic units and literal 
strings, respectively. There are, however, several fea- 
tures in SYMPLE’s metalanguage which were incor- 
porated to extend the power and simplicity of gram- 
matical description over that of standard BNF. 

Actual productions in SYMPLE’s metalanguage to 
define the parsing desired in the preceding example are 

(LABEL-FIELD) :5&5(0$‘ ’0$(DIGIT)) 

(DIGIT) 0" | ' 1 ' I | | 5" | ' 6 ? | 7 1 1 8’ | "9 ^ 

The first production above is interpreted as: a label 
field is defined as not less than five nor more than five 
characters of a string of zero or more blanks immedi- 
ately followed by zero or more digits. 

Productions 

The syntactic units of the base language are defined 
by productions in the metalanguage. These produc- 
tions are of the form: 

(LHS) : right side 

where (LHS) represents the syntactic unit being de- 
fined on the left side and the right side contains meta- 
linguistic descriptions of other syntactic unit(s) and/or 
literal string (s) in the left to right order in which they 
comprise the structure of (LHS). The colon (:) sepa- 
rates the defined syntactic unit on the left side from 
the defining information on the right side. 

The first production of the base language grammar 
must be the definition of the syntactic unit representing 
the total syntactic structure of the base language (i.e., 
the initial or distinguished symbol of BNF). Other 
productions may be in any order. 

(Named) Syntactic units 

The metalinguistic representation of a syntactic unit 
in a production is a string of arbitrary length enclosed 
in parantheses. The string (called the name of the 
syntactic unit) may be composed of any characters 
with the exception of those used as special delimiters 
in the syntax description metalanguage (i.e., illegal 
characters are ():/)$&). 
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Literal strings 

A literal string is represented in the metalanguage 
by the desired string of characters enclosed in single 
quotation marks (’). Any character may be used within 
a literal string, except that a single quotation mark is 
represented by two adjacent single quotes for each 
occurrence in the literal string in order to differentiate 
it from the ending delimiter of the literal string. 

Alternatives 

If a syntactic unit in the base language may have 
alternative representations, these alternatives may be 
represented in the metalanguage as a single production 
with the alternatives of the syntactic unit each appear- 
ing on the right side and separated from each other by 
the conventional OR symbol (|). 

Example : (DIGI r):T|‘2’|‘3’| (OTHER) 

Complex substructures (Unnamed syntactic 

units) 

If one does not wish to break down and label a syn- 
tax substructure in detail, but simply label an entire 
complex substructure as a syntactic unit, pairs of pa- 
rentheses may be used as grouping indicators. Consider 
the following equivalent examples of a definition of 
the syntactic unit (NUM4). 

Example: (NUM):‘2’|‘3 , |<4' 

(NUM2):‘3 ? |'4’|‘5’ 

(NUM3):‘5’| ‘6f7’ 

(NUM4):T (NUM) (NUM2) |T 
(NUM3) 

Example: (NUM4): T ((‘2^3^40 ('S'l^’I'S') I 
( f 5'| ‘6’| *7')) 

Grouping may occur to any depth desired and each 
quantity within the grouping parentheses must have 
the form of any legal right side of a production. 

Quantity repetition and bounds 

Often in the syntax of a base language a (named or 
unnamed) syntactic unit or literal string may be re- 
quired to occur several times. Or it may be desirable 
to specify that a syntactic structure ba a function of 
the length of an input string in addition to other quali- 
fications (e.g., a label field of exactly five characters 
and consisting of . . . ). 

To indicate either the repetition of a string (i.e., the 
input string defined by a syntactic structure) or the 
length bound on the number of characters in some 


string, an operator group must precede the respective 
quantity in the syntax. The operator group is of the 
form n$m or n&m for the string and character counters 
respectively, where n is an integer representing the 
lower bound and m, an integer representing the upper 
bound. 

Consider the following example. 

(A) : 3$3 (SUB-STRUCTURE) 

(B) : 3$3 (SUB-STRUCTURE) 

(C) : ‘C’ 

(SUB-STRUCTURE) : 0$5 (C) 

1$3‘AB’ 

The first production defines (A) as exactly three strings 
of (0$5(C)1$3‘AB0. Thus, acceptable strings for (A) 
might be ABABAB or ABCABCCCCABAB or CCA- 
BABCABAB, etc. However, (B) is defined as exactly 
three characters which are otherwise defined as in (A). 
Thus, (B) can be only CAB; no other combinations 
will yield exactly three characters. Notice that the 
string counter differs from the character counter in that 
it is distributed over all inner strings whereas the char- 
acter counter represents an absolute bound over a given 
substructure. 

When productions include quantities with repetition 
counts, the parser which utilizes these productions will 
attempt to find the largest number of those quantities 
in the input source consistent with the upper bound of 
repetitions. If the input contains more than the upper 
bound of these quantities, the input string correspond- 
ing to the upper bound count of quantities will be recog- 
nized and succeeding repetitions will be analyzed ac- 
cording to the syntax following. A lower bound count 
of zero is allowable and simply indicates the optional 
omission of the quantity. 

The absence of an explicit lower bound implies a 
lower bound of one. The absence of an explicit upper 
bound implies an upper bound which is the maximum 
bound allowable in the system. In the present imple- 
mentation it is 32767. It should be noted that 

1$1(SYUN) and (SYUN) are equivalent as are 

$(SYUN) and 1S32767 (SYUN) 

Complement look-ahead 

The symbol — i preceding a literal string, syntactic unit 
or grouping indicates that at that point in the syntax 
the quantity indicated must not occur. This is called a 
complement look-ahead for the indicated quantity at 
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parse time. If the quantity is found, the parse being 
attempted has failed. (Any syntactic units found on the 
look-ahead will not result in macro template match 
attempts.) If the quantity is not found, the parse con- 
tinues as before the complement look-ahead. 

Example: (LETTER) :‘A’ | | ‘C* | ‘D’ |‘E' 

(SPLTRSTRG) :$(-i ‘C’ (LETTER)) 

The strings recognized as (SPLTRSTRG) will be any 
string which consists of one or more of A, B, D or E, 
but not C. 

Scan positioning 

The production defining a syntactic unit can be made 
to include, without investigation as to structure, an 
arbitrary lengh of input, or it may require that a 
particular syntactic unit in the input conform to more 
than one syntactic structure. This is done by explicitly 
positioning the location at which the parser is “looking.” 
This location, called the scan position, can be adjusted 
either relative to its present position or to the beginning 
reference points in the syntax of the parsed input. 

a — X (Space) positioning 

The occurrence of the symbol X immediately followed 
by an unsigned integer number and delimited by brack- 
eting commas at any point in the right side of a pro- 
duction will cause the scan position to be adjusted 
rightward from its present location the integer number 
of positions specified. The symbol X and following 
number must be bracketed on both sides by commas 
except in the following cases: X is the first (last) 
symbol of a grouping level or the first (last) symbol of 
the right side of a production, in which case the left 
(right) comma is not required. 

Example: Define an (END-CARD) to be an 
80 character string. The first six characters must be 
blanks, the next 66 characters must have the word 
END somewhere with the rest blanks, and the last 
eight characters may be anything. 

(END - CARD): 6 & 6‘ ’66 & 66 (OS 4 ’ (‘END’) 

or o , xs 

b — T (Tab) positioning 

The format is similar to that of X positioning, except 
a T is used instead of an X. 

The T scan positioning results in the scan position 


being moved the specificed number of places to the 
right of the beginning location at which the parse began 
at (1) this grouping level, if the T positioning appears 
within a grouping parenthesis pair, or (2) the right side 
of the production otherwise. 

Example: A syntactic unit (EMPLOYEE-NO.) 
is defined to be an 80 character string with a syntactic 
unit (LAST -NAME) beginning in position one, fol- 
lowed by a single blank and then the syntactic unit 
(FIRST-NAME). Exactly 15 spaces after the begin- 
ning of (FIRST-NAME) is to appear the syntactic 
unit (CODE). Finally (NUMBER) will be 75 spaces 
from the beginning of (EMPLOYEE-NO.) . 

(EMPLOYEE-NO. ): (LAST-NAME) 
((FIRST-NAME), T15, (CODE)), T75, 
(NUMBER) 

Recursive grammars in the metalanguage 

Recursive grammars (i.e., productions with the 
syntactic unit of the left side occurring as well on the 
right side, or being in the derivation of a syntactic 
unit of the right side) are allowed in the metalanguage 
subject to certain conditions. 

For instance, left recursive productions are not al- 
lowable, but other recursive productions are allowable. 

Further, the character (&) bound counts are cumula- 
tive from the initial (top) occurrence in a recursive 
parse while the repetition bounds ($) are effective at 
each level of recursion. 

Non-specific grammars in the metalanguage 

Let a non-specific grammar be one in which the 
particular alternatives of structure for a syntactic unit 
may have structurally the same headings (i.e., leading 
components wdiich are structurally the same). The meta- 
language allows the specification of such grammars 
and at recognition time the parser always picks the 
first specified (or left most) alternative as its initial 
guess. Subsequent guesses continue with the next 
specified alternatives. 

The user must be aware of the possible consequences 
if the apparent ambiguity in a non-specific grammar 
causes the recognition of syntactic units to be rejected 
later as a result of an unsuccessful parse. Though the 
back-up to the next alternative is handled automatical- 
ly by the parser, the syntactic units recognized may 
result in macro invocations; the results of w r hich will 
not automatically be negated. Relevant user aids in 
this area are provided by the system. 

The following example illustrates a parsing grammar 
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for a language which is context sensitive and not con- 
text free and which utilizes recursive productions. 

L = ((M”0«:n > 1) 

(LANG) :(LSTR) -i T’, Tl, rO’(RSTR) 

(LSTR) ; < 0 , (LSTR) i l , | < 0r 
(RSTR) ^L^STR^O'lTO' 

The parser first determines that the input string 
belongs to the context-free language (M n x; checks to 
make sure x does not begin with a 1 ; repositions to the 
beginning of the parsed substring of l's and then de- 
termines that the remaining substring of the input 
string belongs to the context-free language 1”0”. If 
the above conditions are true, then the input string 
belongs to the context-sensitive language 0 n l w 0 w . 

The SYMPLE macro facility 

The macro facility of SYMPLE provides the actual 
translation mechanisms. The macros themselves are 
read in to the system following the base language 
grammar and prior to the user's source deck. The in- 
dividual macro definitions are described in this section. 

MACRO FORMAT 

The overall format of an individual macro definitions 
is as follows : 

< macro name > ( < syntactic 

unit>) — < template body> / (< syntactic 
unit > ) = < template body >....; 
macro semantic statements 
END; 

The exact format and meaning of the various parts 
are described in the balance of this section. 

Maci;o name 

The first item to appear in the macro is the name of 
the macro. The name may be any string of characters, 
excluding those special characters previously men- 
tioned as excluded from a syntactic unit name. The 
macro name is used exclusively as a “handle" for the 
user's organization and SYMPLE's internal system 
and macro referencing. The macro name should not 
be confused with a macro reference in the source text. 
A source reference to the macro is completely inde- 
pendent of its name. 

Templates 

Following the macro name are a series of macro 


templates which are descriptions of possible macro 
references that will cause the invocation of the macro. 
A single macro template is of the form : 

(< syntactic unit >) = < template body > 

where the syntactic unit is any syntactic unit that may 
occur in the base language, and the template body, if 
present, consists of a description of a specific structure 
to be found within that syntactic unit. The syntax and 
semantics of template body are identical with those 
of the metalanguage of SYMPLE except for an exten- 
sion to make it possible to identify and name argument 
strings for the macro. 

The extension added to facilitate the identification 
and naming of argument strings was simply to allow 
the enclosing of the desired argument location in the 
syntactic structure of the template within bracketing 
parentheses and preceding the left enclosing parenthesis 
with a name (with the same character restrictions as a 
macro name) to be associated with the enclosed argu- 
ment string. These enclosed argument strings may 
occur anywhere within the template, and in fact may 
even enclose other argument strings. The names associat- 
ed with the argument strings must be unique within a 
single macro template. 

A macro template may cause a macro invocation in 
the following manner. When the syntactic unit desig- 
nated on the left of the equal sign in a macro template 
is recognized by the parser, the actual structure of the 
syntactic unit found is compared with the specific 
syntax specified in the template body. A successful 
comparison results in the invocation of the macro and 
the passing to the macro of identified argument strings 
in the macro reference, if any. If no template body is 
specified, then the macro is immediately invoked with 
no arguments passed. 

The syntax structure defined in a template body 
need not be structurally consistent with that of the 
object syntactic unit in which it will be compared. 
However, if the template body contains syntactic units, 
these units must have been in the productions submitted 
with the description of the base language. These pro- 
ductions though can be stand-alone productions (not 
logically in the normal base language structure) in- 
cluded solely for use within templates. The use of these 
stand-alone syntactic units, literal strings, and alterna- 
tive arrangements and selection of syntactic units in 
the base language can result in template structures 
quite different from those recognized in the process of 
finding the object syntactic unit. Thus the template 
comparison is actually an attempted reparsing within 
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the physical bounds of the object syntactic unit accord- 
ing to the template syntax description. 

Any number of macro templates may follow the 
macro name, with a slash (/) separating each, except 
that the last template is followed by a semicolon (;). 

Example: NOl (LABEL) = A1 ( ‘ ’ A2 

((NUM)))/ (STMT) = ‘C’ A3 (X79); 
macro semantic statments 
END; 

Macro NOl will be invoked when either 

1. A (LABEL) s found consist'ng of four blanks 
followed by a (NUM), or else 

2. A (STMT) is found beginning with the letter C. 
In. case 1 two argument strings will be available for 
manipulation and testing by the macro semantic 
statements; that associated with the string name Al 
will be four blanks and the found (NUM); that as- 
sociated with the string name A2 will be just the found 
(NUM). In case 2, the argument string associated with 
string name A3 will be the 79 characters following the 
initial letter C. 

Argument string names which are not in a matched 
template or wdrich are associated with null argument 
strings in the matched template are associated with 
the null string (i.e., have a length attribute of zero). 

Macro semantics 

a — General 

The macro semantics facility in SYMPLE is based 
on a string oriented language which drives an inter- 
pretive mechanism. This language closely parallels 
SNOBOL and has a simple syntax. The basic form of 
most semantic statements is 

< action verb>, < string name> = < string ref- 
erence >, < string reference > . . . ; 

where the action verb is a key word describing some 
action to be performed on the referenced strings (literal 
strings, string names, etc.) with the resultant string 
generally being associated with the given string name. 
The details of the semantic language facility are 
described in another paper. 13 The use of relatively 
simple semantic statements in later examples should be 
intuitively understandable. 

This semantic language provides the ability to: 

1. manipulate strings of characters 

2. reference strings literally, directly, indirectly 


3. reference strings with concise notations 

4. communicate between macros 

5. execute subroutine-like macros 

6. manipulate strings of values 

7. alter sequential execution (branch) 

8. insert strings back into the ground language code 

9. loop repetitively 

10o perform string comparisons 

11., display string-string name associations 

12. terminate interpretive action 

to which needs to be added for our discussion one 
capability not explicitly mentioned: the ability to 
dvnamically alter entire macros (templates and se- 
mantics) . 

This last capability mentioned and number 8 listed 
above are the means by which the macros effect 
their results in the translation process. 

b — Output string insertion 

Strings which are produced in the macro semantic 
portion of a macro may be inserted into the source code 
in any of several ways. The semantic language state- 
ment which directs the insertion of a string is of the 
form: 

INSERT, < directive > = < string name (s) > ; 

The directive is a code rather than a string name which 
specifies the type of insertion to be performed. The 
directive codes are I, IA, IB, A, B, A, < digit >, B, 
< digit > , PI, PIA, PIB, PA, PB, PA, <digit>,PB 
< digit > andMADD. 

They are explained below. 

I— The string name(s) is an argument 
string name(s). The associated 
string is to replace the argument 
string occurrence in the macro 
reference. 

I A — The string name(s) is an argument 
string name(s). The associated 
string is to be inserted immediately 
after the referenced argument 
string in the macro reference. In 
this, and for all remaining inser- 
tion directives, the macro reference 
itself remains unchanged. 

IB --Same as I A except read “before” 
instead of “after”. 
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A — The string(s) associated with the 
string name(s) is to be inserted 
immediately after the syntactic 
unit in which the current macro 
reference occurred. 

B — Same as A except read “before’ 
instead of “after.” 

A, < digit > — The string (s) associated with the 

string name(s) is to be inserted 
after a particular syntactic unit 
or grouping level of the parsed 
tree, called I the referenced syn- 
tactic unit (RFSYUN). The RFS- 
YUN is the first syntactic unit 
(at the same or higher level) to 
the immediate left of the syntactic 
unit or grouping level on the 
parsed tree, wl o ;e derivation in- 
cludes, and is the value of < digit > 
levels above, the present macro 
reference. If a RFSYUN does 
not exist by the above definition 
then the directive A, < digit > 
references the beginning of the 
input stream. 

B, < digit > — Same as A, dig’t except read “before” 

instead of “after”. 

P prefix directives — (e.g., PI, PIA, etc.) Each P 
prefix directive results in the 
same type of insertion as the non- 
prefixed directives. However, the 
string inserted is transparent to 
all future attempts at parsing or 
template matching (i.e., “protect- 
ed”). The only exception to 
this is that a P prefix inserted 
string will be visible to the tem- 
plate matching of a specially 
designated macro, called the “edit” 
macro, whose name is specifiied 
at submission time via the proc- 
essor control language. All P 
prefix inserted strings, if unal- 
tered by the edit macro, will ap- 
pear in their inserted locations 
in the final output. 

c — Dynamic macro modification 

In addition to inserting strings in the source sub- 


mission, strings may be treated as new/changed macros 
via the following directive. 

MADD — The string associated with the string 
name is a macro and includes macro templates 
and/or macro semantics. If the macro is new (no 
other macro with the same name) it will be added to 
the present library of macros for this submission. 
If the macro name is that of a current macro, macro 
templates, if present will be added to those presently 
associated with the macro and macro semantics, if 
present, will replace those of the present macro. 

CONCLUSION 

The purpose of the SYMPLE system is to provide a 
general language-independent macro preprocessor. The 
syntax directed approach was used to allow both general 
and flexible macro referencing techniques. 

The SYMPLE syntax description metalanguage 
was designed from the premise that the metalanguage 
should be a practical tool for real programming lan- 
guages with their many syntactic idiosyncracies (e.g., 
imbedded blanks, fields of specified length, continua- 
tion columns, etc.). As far as possible and practical 
these real problems should be easily describable in 
the SYMPLE syntax description metalanguage. In a 
standard BNF metalanguage, such problems are at 
best very awkward to describe. This led to such con- 
cepts as length and repetition binding, and explicit 
scan positioning. 

Explicit scan positioning added the ability to per- 
form successive analyses, even within a local template 
match, by repositioning the analyzer for rescan of 
already parsed information. This rescanned information 
may of course, contain different information as the 
result of insertions from macro invocations. 

The insertion of information in the “protected” mode 
(P-prefix directive insertions) further extends the 
power of the scan and rescan mechanism of the syntax 
analyzer. It allows the user the option to insert code 
which either may possibly affect the future syntax 
analysis (normal mode), or be completely “transparent” 
and thus not possibly affect subsequent syntactic 
analyses. 

Systems such as TMG , 10 COGENT 14 and similar 
syntax directed compilers or compiler-compilers have 
their semantic actions hooked to the parsed syntactic 
units of a source submission, much the way SYMPLE 
would do without the local syntax parsing of a macro 
template. In the context of macro processors, however, 
the application of global syntactic analysis followed 
by local syntax analysis for the macro references ap- 
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pears to be a new application. The obvious advantage 
of this technique is that it provides a means of specify- 
ing a contextual dependence for macro references. 
Patterns in the source input which would qualify as 
macro references on a local syntax basis will qualify 
only if they are in the correct global context. 

Several previous macro systems [notably XPOP, 7 
ML/I, 2 LIMP 19 ] use some sort of a generalized macro 
reference technique. Most used a template matching 
technique based on pseduo-syntax methods (e.g., 
noise word structuring of XPOP, specific literal tem- 
plate structures of LIMP). In each case, however, the 
scope of applicability of these macro references was 
not controlled on a global syntactic basis. ML/I, for 
instance, depends on the occurrence of a name of a 
macro in a statement for the recognition of a macro 
reference. XPOP looks for a macro reference in each 
statement based on word structures, with non-“noise” 
words in these structures being the arguments. Macro 
references in LIMP are perhaps the most general of 
the above mentioned systems. However, the templates 
of LIMP are (1) literal templates (i.e., character 
strings — not defined syntax structures) with “holes’ ’ 
in them, the “holes” being filled by the required argu- 
ments; and (2) each template is eligible in any given 
“line” of input. Thus there is no discrimination in re- 
gard to the applicability of a template on a global 
basis in any of the above mentioned systems; nor is 
there structuring of the templates themselves on a 
general syntactic basis; nor can the arguments be 
identified in a really general manner. It would take 
little to show that, at least from a macro reference 
point of view, these systems would be relatively simple 
special instances in SYMPLE. 

The general applicability of the SYMPLE system 
has been alluded to, and a few mostly simple examples 
are illustrated in the appendix. These examples illus- 
trate the use of SYMPLE as a language extension 
facility, in handling “sift” problems, and text editing. 
There are certain to be many other areas of appli- 
cability not mentioned. 

APPENDIX I 

SYMPLE 'processing examples 
Example 1 

The first example of this appendix is designed to 
take OS/360 Fortran IV input and condense all non- 
comment statements into single condensed strings by 
eliminating unnecessary blanks, sequence number 
fields, and continuation fields. Each condensed state- 
ment will be separated by a record mark ( !) . Processor 


control information is included for completeness. 

SYNTAX; 

(PROG) :*(STMT) (END-CARD) 

(STMT) : (LABEL-FIELD) (‘0’|‘ ’) 
(UNLAB-STMT) | (COMMENT) 
(COMMENT) : ‘C’, T81/ !’ 

(LABEL, FIELD) : 5 & 5((BLKSTRG) (NUM)) 
(UNLAB-STMT) : (END-BODY) (BLKSCN) 
(SEQFIELD) 0$19 ((CONT-FIELD) 
(BLKSON) (SEQFIELD)) 

(END-CARD): 6$6 ‘ ’ (END-BODY) (SEQ- 
FIELD) 

(END-BODY) : 66 & 66 (0$1 (BLKSTRG) 
‘END’ 0$1 (BLKSTRG)) 

(BLKSCN) : 66 & 66 (0$1 (BLKSTRG) $ 
(NONBLK)) 

(BLKSTRG) : ’ 

(NONBLK) :_ i ‘ Xl|“”$ (-*“”, Xl)“” 

(NUM) : ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ 
(SEQFIELD) :X7,‘!’ 

(CONT-FIELD) : H ‘C’, T6, H (‘0’ |‘ ’), Xl 
SYNEND; 

MACROS; 

CONDENSE (BLKSTRG) = A1 ((BLKSTRG))/ 
(SEQFIELD) = A1 ((SEQFIELD))/ 
(CONT-FIELD) = A1 ((CONT-FIELD)); 

REPLACE, Al=; 

INSERT, I = Al; STOP ; END ; 

END-STMT (UNLAB-STMT) =; INSERT, 
A=M’; STOP; END; 

MACEND; 

SOURCE, RECMK ; 

INTEEER *2 AA ( 4 ) / 1 A B C D 1 / 4 1 


1 BLK/ 1 1 / , VAL ( 1 0 0 ) / 1 00*0/ 2 

DO 10 K = 1 ,100 3 

VAL ( K ) = VAL ( K ) + 4 

1 9 5 * K 5 

A - AA ( 4 ) 6 

10 AA ( K/ 1 0 0 ) = BLL 7 

10000 STOP & 

END 9 


Output from SYMPLE after processing above input 
I NTEGER *2 AA (4) / ‘A B C D’/, BLK/‘ ’/, VAL 
(100)/100*0/!D010K= 1,100! 

VAL(K) = VAL(K) + 95* K-AA (4)110 
AA(K/100) = BLK! 1000 STOP! END! 

Notes on example: 

1. The grammar of Fortran IV is detailed here 
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only to a level which will distinguish major 
substructures. If one wished to further detail 
the syntax structure, the syntax of Fortran 
statements in the condensed form would be 
relatively simply since all extraneous clutter 
has been removed. The P-prefix insert capability 
could be used to ignore clutter for possible 
reparsing without actually removing it from the 
input (and thus output), 

2. The grammar is non-specific with at least one 
point of apparent ambiguity. The beginning 
characters of an (END -CARD) will qualify 
as the beginning characters of a (STMT) (i.e., 
6$6‘ ’= (LABEL-FIELD) 6 ’)• Thus upon en- 
countering an (END -CARD) there will be a 
back-up, since an attempt is first made to parse 
it as a (STMT). In this case, of course, the 
back-up will not have any bearing on the total 
processing result. 

3. (LABEL-FIELD) will accept a label, say 
bblb5 and the compressed result would be 15. 
The structure of this particular (LABEL- 
FIELD) would be 

bb 1 b 5 

(BLKSTRG) (NUM) (BLKSTRG) (NUM). 

4. Note how (NONBLK) includes all non-blank 
characters and literal strings (including blanks.) 

5. The macro template (UNLAB-STMT) = ; in 
the second macro results in the macro END- 
STMT being invoked with no arguments. 

6. The use of the processor control RECMIv par- 
ameter results in a ! being added to the end of 
each logical record on input. The syntax gram- 
mar used assumes this, though an equivalent 
grammar without the RECMK could easily 
be used in this case. 

Example 2 

This example is designed to remove all redundant 
parentheses in a language which uses pairs of left and 
right parentheses for grouping. A redundant paren- 
theses pair is any pair of parentheses which enclos s 
a string which is also totally enclosed in parentheses. 

SYNTAX; 

(FLANG) :$(PAREN) 

(PAREN) :^(INARDS)0$1(INT00) 

(INARES) : (PAREN) | (INTOO) 

(INTOO) :0$(-i ‘)’-i O', X1)0*1 (PAREN) 
SYNEND; 

MACROS; 


REDUN (PAREN) =‘ ( , AA((INARDS)) < y ; 
SEPART,AA = * (’, AA, <)’/¥, LI; INSERT, I = 
AA; LI :STOP; END; 

MACEND; 

SOURCE, LIST; 

(((A(B)))C)((((XYZ((Q))(A))F))) 

/* 

Output from S YMPLE after processing 
( ( A (B) ) C) ( (X Y Z (Q) (A) ) F) 

Note: In a recursive parse, inner-most (lowest) 
recursive syntactic units [e.g., (PAREN)] are 
recognized first, and subject to macro expansion 
first. 

Example 3 

A final example shows a simple extension of OS/ 
360 Fortran IV obtained by adding a different state- 
ment type to the grammar. This different statement 
type will contain a macro reference. The format of, 
and argument location in, the macro references will 
be strictly dependent on the local syntax specified in 
the templates of the macros. 

A different statement type could be designated 
simply as starting with a non-numeric non-blank 
character after column 1 and before column 6. 
The grammar defining this basic extension could ap- 
pear in a submission as follows. 

SYNTAX, PUT; 

(PROG) :$(STMT) (END-CARD) 

(STMT) :(NEW-STMT)| (END-CARD), T80 
(NEW-STMT) :5&5($‘ '$(NONNUM-BLK)),T80 
(END-CARD) : 6&6 6 * 66&66 (0$ ‘ ’ ('END') 

or ’),xs 

(NUM):‘0tl1 f2 t 8 1 <4 1 <5 t 6 1 <7 1 <8 1 <9, 

(NONNUM-BLK) : (NUM) 6 \X1 
SYNEND; 

At this point the syntax description differentiatin g 
this new statement type is defined and any user could 
take advantage of the description which via the proc- 
essor control PUT parameter has been saved. Using the 
appropriate processor control and job control state- 
ments to retrieve the above syntactic specification, a 
user could make submissions similar to the following. 

SYNTAX, GET; 

(NOISE) , |‘STORE , |'IN , | < TO , |‘INTO , |THE , | 
‘PUT'|‘OF’|‘AND’ 
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(NON-NOISE) : $ (_, (NOISE), XI) 

SYNEND; 

MACROS; 

SUM (NEW-STMT) = A1 ($ (NOISE) (‘ADD’| 
‘SUM’)$(NOISE)A2((NON-NOISE)) $ 
(NOISE) A3 ((NON-NOISE)) $ (NOISE) 
A4 ((NON-NOISE)), T80) ; 

CONCAT, A1 = ‘ ’, A4, ‘ = ’, A2, <+’, A3; 

INSERT, I =A1; 

STOP; END 
MACEND; 

SOURCE; 

C THIS IS A FORTRAN COMMENT 
ADD A TO B AND STORE IN C 

SUM A AND B AND PUT INTO C 
STORE THE SUM OF A AND B IN C 
END 

/* 

Output of SYMPLE after processing 

C THIS IS A FORTRAN COMMENT 

C = A + B 
C - A+ B 
C = A + B 
END 

The macro used above is a simple macro using a key- 
word and non-noise positional parameters. The il- 
lustrated new type of statement if imbedded in any 
Fortran source deck, would, when processed, be con- 
verted to the Fortran type statements listed, and re- 
place the new statements. 
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INTRODUCTION 

An algebraic facility for LISP is quite desirable. 
Such a capability is motivated by the desire to uti- 
lize the primitive LISP arithmetic functions at the 
algebraic expression level. The requirement for a 
means of evaluating expressions might very well arise 
from applications in algebraic manipulation. Thus, 
the user, having performed some sort of transforma 
tion on an algebraic expression, might wish to have 
the resulting expression evaluated for a specific set 
of values. This facility, in response to this require- 
ment, has the acronym “LEAF” (LISP Extended 
Algebraic Facility). 

Design considerations and FORTRAN language 
facilities provided by LEAF include: 

1. a list structured organization compatible with 
existing LISP; 

2. an arithmetic assignment statement ; 

3. a DO statement; 

4. a logical IF statement ; 

5. an unconditional GO TO statement; and 

6. an INPUT and OUTPUT statement. 

Since LEAF is designed in the “spirit” of LISP, 
built in functions in a given LISP system which pro- 
vide for such conveniences as “pretty printing” of 
functions and editing facilities may also be applied to 
LEAF programs. 

The list structured organization of LEAF 

Although the initial motivation in developing LEAF 
was to extend the LISP language, a number of other 
motivating properties of the LEAF concept make 
themselves apparent as one uses the LEAF facility. 


In order to attain compatibility with the existing 
LISP language, LEAF is essentially a dialect of 
FORTRAN in list structure . Hence, a program is a 
list whose elements are statements. A simple LEAF 
program to accept two numbers from the teletype, 
determine their sum, and type out the result might be 
written as follows: 

( ( INPUT A B) 

(C = A +B) 

(OUTPUT C ) ) 

In similar manner, a statement is a list whose ele- 
ments are the components of that statement. In order 
to execute a statement, the LEAF interpreter typically 
looks at the keyword (e.g., INPUT), the first element 
of the statement, to determine how the statement 
should be processed. This is analogous to the LISP 
interpreter, in which the first element of a LISP 
command is a function, and the remaining elements 
of that command constitute the arguments of the 
function* 

In the “assignment” function, unlike the other 
LEAF commands, the keyword or “ = ” is the second 
element of the list. If the item on the left hand side 
of the equal sign is an array reference, the subscripting 
can be thought of as a single list element, a sublist 
whose elements constitute the subscripts. In SDS 940 
LISP as well as in other LISP implementations, com- 
mas are perfectly acceptable list element delimiters. 
Thus, the user is free to use commas for readability 
in subscript lists if he desires, and he is not constrained 
to always delimit list elements with blanks. It is im- 
portant to note in the case of a subscripted variable 
on the left hand side of the equal sign in the assign- 
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ment statement that the “ = ” is in fact the third ele- 
ment of the list. Nevertheless, recognition and proc- 
essing of the assignment statement is still a relatively 
straightforward procedure. 

In addition to the properties LISP and LEAF share, 
it is interesting to note that the conveniences which 
exist for displaying and modifying LISP functions are 
also applicable to the display and modification of 
LEAF programs. The nesting of DO loops is readily 
apparent from the indented listing one obtains from 
the LISP “pretty printing” facility: 

((DO I = 1 TO 10 
(A(I) = B(I)) 

(DOJ = 1 TO 10 
( • ) 

( . ) 

( • ) ) 

( ■ ) 

( . ) 

( . ) ) ) 

In like manner, one may utilize the editing facilities 
available on a given LISP system to modify a LEAF 
program with equivalent flexibility as modifying a 
LISP function. 

Justifications for a list structure 

It is worthwhile noting that the list structured ap- 
proach to the design of an algebraic language lends 
itself well to the concepts of program block structure, 
program editing, adaptability to a time sharing en- 
vironment, and, most important of all, language and 
data structure compatibility. 

Program block structure of the LEAF system is 
best illustrated by the DO statement, in which a list 
whose elements are statements constitute the range 
of the DO specification. This program block structure 
lends itself well to editing operations, since, armed 
with an indented listing of his program, one is able to 
quickly and accurately access and work with his pro- 
gram at any level. An example of program modification 
using the editing facility of SDS 940 BBN LISP is 
given in Appendix C. 

Like the LISP language, LEAF lends itself well to 
a time sharing environment, in that LEAF programs 
are easily interpreted at the source language level. 
List structured organization of LEAF programs per- 
mit several users to work independently with the same 
reentrant interpreter, even when two separate pro- 
grams are “intertwined” in the same storage region. 

A particularly significant observation one might 


make of the LEAF language is that it possesses the 
same basic structure as its data. Hence, there is no 
reason why one might not wish to devise a program 
which performs operations upon itself, such as the 
changing of a “+” to an in an arithmetic expression. 
In this sense, within the framework of the; LEAF 
language, a statement might be thought of as an al- 
phanumeric vector whose elements are keywords, 
operators, and operands. 

Fortran language facilities provided by LEAF 

1 . The Assignment Statement 

The assignment statement of LEAF is identical 
to that of FORTRAN IY with the additional 
flexibility of mixed mode arithmetic;. Thus, 
one may work interchangeably with both in- 
teger and real data in arithmetic expressions 
without worrying about problems of mode 
conversion, since the existing LISP floating 
point functions are designed to handle such 
situations automatically. 

2. The DO Statement 

The DO specification of LEAF is similar to 
that of PL/I. The remainder of the statement 
consists of a list whose elements as statements 
constitute the range of the DO. Any level of 
nesting is permissible, and the LISP “pretty 
printing” facility shows the nesting quite 
clearly as illustrated earlier. 

3. The Logical IF Statement 

Like PL/I, the logical IF statement consists 
of an “IF” part followed by a “THEN” part. 
The “IF” part consists of two arithmetic 
expressions separated by a relational operator 
(without periods). The true or false value of 
the relation determines the execution or non- 
execution of the “THEN” part. In either event, 
the next statement in sequence is reached. 

4. The Unconditional GO TO Statement 

The GO TO statement of LEAF, like that of 
PL/I, specifies destination by means of a name 
rather than by means of a statement number 
as is the case with FORTRAN IV . 

5. The INPUT Statement 

The INPUT statement consists of the key 
word “INPUT” followed by the variables to 
be defined. The “RATOM” (read atom) func- 
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tion of SDS 940 BBN LISP permits relative 
free formatting of input data. 

6. The OUTPUT Statement 

Similarly, the OUTPUT statement consists 
of the keyword “OUTPUT’’ followed by the 
variables to be printed. The “PRINT” function 
of SDS 940 BBN LISP is utilized in this con- 
text. 

CONCLUSIONS 

The LEAF approach seems to be an answer to certain 
problems facing users who are dissatisfied with present 
day LISP and present day FORTRAN. Feasibly, pro- 
grams already written in FORTRAN IV might be con- 
verted to LEAF. The advantages of indented display 
of program nesting as well as the facilities of the 
LISP editor would certainly warrant this activity. 

Working with an algebraic language at the source 
language level has many distinct advantages. Among 
these advantages, this writer suggests that the COM- 
MENT statement should be treated as an executable 
statement, whose text could be made to be listed by 
user request during program execution. 

The author sincerely hopes that the philosophy of 
the LEAF system is given some consideration by the 
implementers of future algebraic compilers. 
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APPENDIX A 

Syntax description of the LEAF system 
L Fundamental Language Components: 

(l letter ) :: = A\B\C\D\E\F\G\H\I\J\K\L\M\N\0\P\Q\R\S\T\U\V\W\X\Y\Z 

{digit) :: - 0\1\2\8\4\5\6\7\8\9 

{identifier) :: = {letter) { ( letter ) \ {digit)}* 

{variable) :: = {identifier) 

{unsigned-integer-constant) :: = {digit) { {digit ) }q 
{ sign) ::=+|- 

{integer-constant) :: = [{sign)] {unsigned-integer-constant) 

{real-constant) :: = [{sign)] ( unsigned-integer-constant ). 

I [ {unsigned-integer-constant }] [ {exponent-part )] | 

[ {sign )] [ {unsigned-integer-constant )]. 

{unsigned-integer-constant ) [ {exponent-part )] | 

[{sign)] {unsigned-integer-constant) {exponent-part) 

{exponent-part) = [{sign)] {{digit )} f 

IJ. Basic Language Elements 
{program) :: = ({ {statement)}™) 

{statement) :: = ( {comment-statement)) |( {optional-statement-label) {statement-body)) 
{comment- statement) :: = COMMENT {commentary )\* {commentary)* 
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( optional- statement-label ) ” = [ (identifier )] 

(statement-body) :: = (do-statement ) \ (input-statement ) \ 

(i output-statement ) | (assignment- statement ) | 

(go-to -statement) \ (if - statement) | (stop- statement) 

(do-statement) :: =DO (index) = (initial-value) TO (final-value) 

( (do-block )) 

(do-block ) : : = { (statement ) } " 

(input-statement) :: = INPUT (argument-list) 

(argument-list) :: = j (variable ) j “ 

(output-statement) :: = OUTPUT (argument-list) 

(assignment-statement) :: = (variable) = (arithmetic-expression) 
(arithmetic-expression )'•'•= (term ) (plus-or-minus ) (arithmetic-expression ) | (term ) 
(plus-or-minus) :: = + | — 

(term) :: = (factor) (star-or-slash) (term) | (factor) 

(star-or-slash) :: = * \ / 

(factor) :: = (variable) | (constant) \ ((arithmetic-expression)) 

(constant) :: = (integer-constant) | (real-constant) 

(go-to-statement) :: = GO TO (identifier) 

iff -statement) :: = IF (arithmetic-expression) (relational-operator) 
(arithmetic-expression) THEN ((statement)) 

(relational-operator) :: = GT\GE\LT\LE\EQ\NE 
(stop-statement) :: = STOP 


APPENDIX B 


Some representative functions of the LEAF interpreter 


(STATEMENT 

(LAMBDA (COMMAND) 

(COND 

((COMMENT -STATEMENT COMMAND) 

NIL) 

((DO-STATEMENT COMMAND) 

NIL) 

((INPUT-STATEMENT COMMAND) 

NIL) 

((OUTPUT-STATEMENT COMMAND) 

NIL) 

((ASSIGNMENT-STATEMENT COMMAND) 

NIL) 

((GO-TO-STATEMENT COMMAND) 

NIL) 

(T (IF-STATEMENT COMMAND))))) 

(COMMEN T-STA TEMEN T 
(LAMBDA (COMMAND) 

(EQ (CAR COMMAND) 

(QUOTE COMMENT)))) 

(DO-STATEMENT 

(LAMBDA (COMMAND) 
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(j PROG ( INDEX FROM TO) 

(COND 

((. NEQ ( CAR COMMAND) 

(l QUOTE DO)) 

(. RETURN NIL))) 

(SETQ INDEX ( CADR COMMAND)) 

(SETQ FROM ( CADDDR COMMAND)) 

0 SETQ INDEX FROM) 

(SETQ TO (CADDDDDR COMMAND)) 

LOOP (COND 

((GREATERP INDEX TO) 

(RETURN T))) 

(LEAF (CADDDDDDR COMMAND)) 

(ADD1 INDEX) 

(GO LOOP) 

))) 

(INPUT -ST A TEMENT 
(LAMBDA (COMMAND) 

(PROG (ARGUMENT-LIST) 

(COND 

((NEQ (CAR COMMAND) 

(QUOTE INPUT)) 

(RETURN NIL))) 

(SETQ ARGUMENT-LIST (CDR COMMAND)) 

LOOP (COND 

((NULL (CAR ARGUMENT -LI ST)) 

(RETURN T))) 

(SET (CAR ARGUMENT -LI ST) 

(RATOM NIL)) 

(SETQ ARGUMENT-LIST (CDR ARGUMENT -LI ST)) 
(GO LOOP) 

))) 

(OUTPUT-STA TEMENT 
(LAMBDA (COMMAND) 

(PROG (ARGUMENT-LIST) 

(COND 

((NEQ (CAR COMMAND) 

(QUOTE OUTPUT)) 

(RETURN NIL))) 

(SETQ ARGUMENT-LIST (CDR COMMAND)) 

LOOP (COND 

((NULL (CAR ARGUMENT-LIST)) 

RETURN T))) 

(PRINT (CAAR ARGUMENT-LIST)) 

(SETQ ARGUMENT-LIST (CDR ARGUMENT -LIST)) 
(GO LOOP) 

))) 


(ASSIFNMEN T-S TA TEMENT 
(LAMBDA (COMMAND) 
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( PROG NIL 

(COND 

((. NEQ ( CADR COMMAND) 

( QUOTE =)) 

(. RETURN NIL))) 

0 SET (CAR COMMAND) 

(ARITHMETIC-EXPRESSION (CDDR COMMAND))) 
(RETURN T) 


(ARITHMETIC-EXPRESSION 
(LAMBDA (LIST) 

(PROG (VALUE) 

(SETQ POINTER LIST) 

(SETQ VALUE (TERM NIL)) 

LOOP (COND 

((NULL (CAR POINTER)) 

(RETURN VALUE)) 

((EQ (CAR POINTER) 

(QUOTE +)) 

(SETQ POINTER (CDR POINTER)) 

(SETQ VALUE (FPLUS VALUE (TERM NIL))) 

(GO LOOP)) 

((EQ (CAR POINTER) 

(QUOTE -)) 

(SETQ POINTER (CDR POINTER)) 

(SETQ VALUE (FDIFFERENCE VALUE (TERM NIL))) 
(GO LOOP)) 

(T (RETURN VALUE))) 


(TERM 

(LAMBDA NIL 
(PROG (VALUE) 

(SETQ VALUE (FACTOR NIL)) 

LOOP (COND 

((NULL (CAR POINTER)) 

(RETURN VALUE)) 

((EQ (CAR POINTER) 

(QUOTE *)) 

(SETQ POINTER (CDR POINTER) 

(SETQ VALUE (FTIMES VALUE (FACTOR NIL))) 

(GO LOOP)) 

((EQ (CAR POINTER) 

(QUOTE /)) 

(SETQ POINTER (CDR POINTER)) 

(SETQ VALUE (FQUOTIENT VALUE (FACTOR NIL))) 
(GO LOOP)) 

(T (RETURN VALUE))) 


(FACTOR 
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(LAMBDA NIL 

(PROG (VALUE POINTER-SAVE) 

COND 

((NUMBERP (CAR POINTER )) 

(SETQ VALUE (CAR POINTER)) 

(SETQ POINTER (CDR POINTER)) 

(RETURN VALUE) 

((ATOM (CAR POINTER)) 

(SETQ VALUE (CAAR POINTER)) 

(SETQ POINTER (CDR POINTER)) 

(RETURN VALUE) 

(T (SETQ POINTER-SAVE POINTER) 

(SETQ VALUE (ARITHMETIC-EXPRESSION (CAR POINTER))) 
(SETQ POINTER POINTER-SAVE) 

(SEQ POINTER (CDR POINTER)) 

(RETURN VALUE))) 

))) 

(FDIFFERENCE 
(LAMBDA (A B) 

(FPLUS A (FMINUS B)))) 

(LEAF 

(LAMBDA (PROGRAM) 

(PROG (LOCATION LABEL 

(SETQ LOCATION PROGRAM) 

LOOP (COND 

((NULL (CAAR LOCATION)) 

NIL) 

((STOP-STATEMENT (CAR LOCATION)) 

(RETURN (QUOTE STOP)))) 

(STATEMENT (CAR LOCATION)) 

(SETQ LOCATION (CDR LOCATION) 

GO LOOP) 

))) 

(STOP-STA TEMENT 
(LAMBDA (COMMAND) 

(EQ (CAR COMMAND) 

(QUOTE STOP)))) 


APPENDIX C 

Representative applications of the LEAF system 

Examples of input statements, output statements, the assignment statement, and arithmetic expressions 

<- INPUT-STATEMENT ((INPUT A B C D E F G)) 

1.0 2.0 3.0 4-0 5.0 6.0 7.0 

Tt 


t The “T” indicates that the invoked function succeeded. 
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<- OUTPUT-STATEMENT (( OUTPUT A B C D E F <?)) 

1.000000000 

2.000000000 

8.000000000 

4.000000000 

5.000000000 

6.000000000 

7.000000000 
T 

ASSIGNMENT-STATEMENT (( H = A + B+ C+ D + E+F+G )) 

T 

<— OUTPUT-STATEMENT ((OUTPUT H )) 

28.00000000 
T 

<- ARITHMETIC-EXPRESSION ((A * B * C * D * E * F * G)) 

5040.000000 

ARITHMETIC-EXPRESSION ((A + B * C )) 

7.000000000 

ARITHMETIC -EXPRESSION ((A * B + C)) 

5.000000000 

ARITHMETIC-EXPRESSION (( (SSSSU)))))))))) - ((B)) /(C + D) 

*(E + F]f 
-2.142857143 

<- ARITHMETIC-EXPRESSION (( (A + B) * (C + D - F) )) 

8.000000000 

<- ARITHMETIC -EXPRESSIONS A— B + C)/(D + F* (((C))) )) 

4-347 826087E-02 

<- ARITHMETIC-EXPRESSIONS A / B —C / D + F * G) 

41.75000000 

A program using input, output, and assignment statements 

E(SETQQ PROGRAM ((INPUT A B) (C = A + B) (D = A - B) (E = A * B) 
(F = A / B) (OUTPUT A B C D E F) (STOP)) 

<- E(LEAF PROGRAM )§ 

2.0 8.0 

2.000000000 

8.000000000 

5.000000000 
—1.000000000 

6.000000000 

6 .666666667 E-01 


f The “]” causes a sufficient number of right parentheses to be 
generated. 

t At this point, the atom “PROGRAM” is bound with the LEAF 
program as shown. The top-level function “E” merely means 
“execute the given function (first elements) on its arguments 
without prior evaluation of those arguments.” 

§ The LEAF interpreter is now applied to the designated pro- 
gram. The user satisfies the INPUT statement by typing “2.0 
3.0 (CR),” and the LEAF system responds with the desired 
output, followed by “STOP” as generated by the STOP state- 
ment. 



STOP 


A program using the DO statement 
«- PRETTY PRINT {SUMMATION) a 
((SUM = 0.000000000) 

(COUNT = 0.000000000 ) 

(DO I = 1 TO 10 ((COUNT = COUNT + 1.000000000 ) 
(SUM = SUM + COUNT) 

(OUTPUT SUM))) 

(STOP)) 

<- E(LEAF SUMMATION) 

1.000000000 

3.000000000 

6.000000000 

10.00000000 

15.00000000 

21.00000000 

28.00000000 

36.00000000 

45.00000000 

55.00000000 
STOP 

Modification of a program using the editing facility 
<- EDITV(S VMM A TION) J 
EDIT 

*(1 (SUM = 1.0)) 

*3 


*P 

(SUM = SUM + COUNT) 
*{4 *) 

*/js 

*PP 


y In this instance, we assume that the “SUMMATION” program 
has already been defined; hence, we need only print it out using 
the “PRETTYPRINT” of SDS 940 BBN LISP. Note how trans- 
parent program block structure becomes via this facility. 

t At this point we wish to edit our sample SUMMATION ex- 
ample to no longer produce successive sums, but to produce 
successive products or factorials. The “*” tells us we are talking 
to the editor. The command “*(1 (SUM = 1.0))” updates the 
first statement of our original summation program (1.0 is the 
identity element for multiplication.). “*3” focuses our attention 
on the DO statement, “*7” focuses our attention on the range 
of the DO, and “*2” focuses our attention on the second state- 
ment of the range of the DO. “*P” causes that statement to be 
printed out, the operation “(4 *)” causes the “+” of that state- 
ment to be changed to an “f” returns our attention to the top 
level, “*PP” “pretty prints” the edited function, and “OK” tells 
the editor we are all done. 
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((SUM = 1.000000000 ) 

(i COUNT = 0.000000000) 

(DO I = 1 TO 10 ((COUNT = COUNT + 1.000000000 ) 
(SUM = SUM * COUNT ) 

(OUTPUT SUM))) 

(STOP)) 

*OK 

SUMMATION 

«- E(LEAF SUMMATION) 

1.000000000 

2.000000000 

6.000000000 

24.00000000 

120.0000000 

720.0000000 

5040.000000 

40820.00000 

362880.0000 
3628799.999 
STOP 
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INTRODUCTION 

The environment provided by the multiprogrammed 
options of Operating System 360 is not the most 
suitable for debugging. It is primarily a batch system, 
with a programmer’s card deck disappearing into the 
card reader and reappearing at some future time on a 
printer. What happens in between is often impossible 
to discern; any attempt to monitor a program’s exe- 
cution (e.g., the setting of an address stop) is so compli- 
cated that it is nearly impossible. In this environment, 
debugging is difficult — at the conclusion of a program, 
the programmer either has successful execution or some 
indication of program error. If he planned ahead 
(and was lucky), his output will include not only an 
indication of the actual error, if one occurred, but 
trace information (either through OS TESTRAN 
facilities or his own printouts) to help him determine 
the problem. However, he is usually presented with a 
dump, containing a numerical reference to the com- 
pletion-codes manual. More importantly, the dump 
represents the state of the system when OS decided 
it could not continue the program’s execution; the 
user must discover why it went wrong by educated 
guesses and by “playing computer” with his program. 
The difficulty and sheer wastefulness of this procedure 
is extremely evident. For this purpose, an on-line 
symbolic debugger can be invaluable. 

One traditional environmental requirement for on- 
line debugging is an on-line system with remote job- 
entry capabilities and file-management functions, or 
a dedicated machine and its operator console. DYDE 
(Dynamic Debugger), the system described herein, 


was developed in and for the former environment 
using the RAND Simultaneous Graphics System. 
However, the debugger can be used in a normal OS 
batch environment using any available 2260 graphic- 
display terminal or even the on-line operator’s type- 
writer. 

The text that follows includes an external description, 
including invocation procedures and command formats, 
followed by a brief explanation of the internal operation 
of the debugger (including the “pingpong” SVC). 

DYDE 

Invocation of DYDE 

DYDE is executed as an OS job using a standard 
set of Job Control Statements (see Figure 1). These 
define the library in which DYDE resides (JOBLIR), 
a library containing the program or programs to be 
debugged (SYSLIB), and a scratch file for organizing 
the symbol table (SYSUT1). In addition, any JCL 
statements defining data sets that are used by the 
program to be debugged must be included (in this 
context, DYDE contains a facility for overriding 
both the SYSLIB and the SYSUTl ddnames if the 
program being debugged needs them). Figure 2 illus- 
trates a procedure for assemblying, link editing, and 
debugging. In any of these procedures, as soon as 
DYDE receives control, it writes out a message indi- 
cating its readiness for user commands. 

Device dependencies 

DYDE can interact with the user through either 
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// JOB 



'//JOBLIB DD 

library 

definition 

//SI EXEC PGM=DYDE 


//SYSLIB DD 

library 

definition 

//SYSUT1 DD 

UNIT=SYSDA , SPACE= (TRK ,(5,1)) 

//SCOPE DD 

UNIT=040 



Figure 1 — Sample JCL for invocation of DYDE using 
the 2260 version 


// JOB 


//JOBLIB DD 

library definition 

//STEPl EXEC 

ASMFCL , PARM . ASM* ' TEST ' , PARM . LKED* 1 TEST ' 

//ASM.SYSIN 

DD * 

source deck 

/* 


//STEP 3 EXEC 

PGM==DYDE 

//SYSLIB DD 

DSNAME-* . STEPl . LKED . SYStMOD , DISP* (OLD , DELETE) 

//SAMPLEDD DD data set description 

//SYSUT2 DD 

UNIT=SYSDA, SPACE- (TRK, (5,1)) 

//SCOPE DD 

UNIT=040 


Figure 2 — Sample assemble, link $dit, and debug JCL 
Note: The SYSLIB card points to the output of the 
Link Edit step, and the user will override (using the 
*DDNAME command) the gYSUTl default 
name with SYStlT2 

an IBM 2260 display station or the IBM 1052 operator’s 
console. For this purpose, two versions of DYDE 
exist; one for the 2260 interaction, the other for the 
1052 (described in Appendix A) . Because these devices 
are extremely different, the mechanics of the interaction 
differ significantly. However, the basic operations are 
the same. 

The more natural mode of operation, and the one 
for which DYDE was originally designed, uses the 
IBM 2260 graphics-display station. This is an alpha- 
numeric device with a CRT capable of displaying up 
to twelve lines of text ; each line can contain a maximum 
of 80 characters. The control unit for the 2260, the 
IBM 2848, buffers typed messages, displays typed 
characters, and handles display regeneration and 
cursor advancement. The main CPU is presented with 
an attention interrupt only when the enter key is 
depressed. The OS Graphics Access Method (GAM) 
schedules an asynchronous routine of DYDE that, 


in turn, activates the main routine in DYDE. The 
message is then read and acted upon. 

The twelve-line screen face is divided into two 
logical sections: 

1. The first three lines — 0, 1, and 2 — are for DYDE- 

user communication; 

2. The remaining nine lines — 3 through 11 — are 

for data display. 

Data is written in the second area in a wrap-around 
fashion — the first data item is displayed starting on 
line 3, the next on line 4, and so on until the screen 
is full. At this point, new data is displayed starting 
again on line 3 (erasing automatically the previously 
displayed data); and line 4 is erased, providing a 
visual delimiter between old data and the most recent 
display. Each new line of data display is handled in 
this manner, with the data overwriting the oldest 
data on the screen, and the next numbered line blanked 
as a delimiter. 

The three remaining lines — 0, 1, and 2 — are used 
for command processing. The user enters his commands 
on line 2 beginning with a start symbol (displayed as 
and usually written automatically by DYDE) 
followed by the command; this is followed by the 
attention or the enter key (displayed as | ) that 
interrupts the CPU. DYDE reads the message and 
immediately echos (i.e., rewrites) it on line 0. This 
provides not only positive verification of the trans- 
mission but also, as the user prepares to type the next 
message, a useful indicator of the last operation per- 
formed. Any data display requested is displayed on 
the first free line of the data area, and the line following 
is blanked. Finally, DYDE writes a confirmation 
message on line 1 and prepares line 2 for the next 
command by erasing it, writing the start symbol, and 
positioning the cursor at the first free space. Should 
the command be syntactically incorrect, an error 
message is written on line 1 — the echo message on 
line 0 provides the user with ready reference for dis- 
covering his error and the data region of the display 
is not disturbed. 

The discussion that follows is concerned primarily 
with the 2260 version of DYDE rather than the 1052. 
Significant differences will be noted ; however, all 
command and message formats, as well as operational 
details, are described for the graphic station, version 
rather than for the typewriter version. 

Typical debugging session 

A typical debugging session begins when DYDE 
gains control and writes its READY message. At this 
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point, the user can identify the program to be debugged, 
perhaps overridding one or more of the ddnames that 
DYDE normally uses. After the program has been 
successfully LOADed, the full spectrum of DYDE 
commands is available to the user. He may indicate 
to DYDE that he wishes execution of his program to 
be temporarily suspended when control reaches speci- 
fied locations; this is done by inserting breakpoints 
at these locations. Commands exist for modifying 
parts of his code or his data. He can then request 
DYDE to begin execution of his program. At this 
point, four events can suspend program execution 
and transfer control to DYDE : 

1. Control reaching a previously defined break- 
point; 

2. Executing the pingpong supervisor call as an 

assembled instruction in the user’s program 
(e.g., useful when debugging an overlay program 
when a particular load is not originally in 
core) ; 

3. An asynchronous interrupt from the user at 
his 2260 (not available for 1052 users) ; 

4. The program program checks (e.g., it specifies 
an invalid address or operation code). 

For release 17 of the operating system, a fifth event 
can suspend program execution : 

5. Whenever the user’s program is terminated 
abnormally by the operating system.* 

At any of the above halting points, the user may, 
for example: (1) display data in his program, (2) 
modify data, instructions, or register contents, (3) 
create hardcopy of specified areas within his program, 
(4) insert new breakpoints, or (5) delete old break- 
points. He may resume execution of his program 
from the point at which it last halted (the “current” 
breakpoint) in either the instruction step mode (exe- 
cute one instruction at a time) or in the uncontrolled 
mode, in which .case only one of the above events 
can suspend program execution again. In this manner, 
the user can watch his program’s execution to catch 
an error as it is occurring as well as test his program 
with sample data or temporary patches. 

DYDE commands 

The available commands that the user may issue 
fall into two general categories: (1) those that create 
the proper environment for debugging the program, 


* Items two through five are considered by DYDE to be implicit 
breakpoints. 


and (2) those that cause actual data display from the 
program. 

All “environmental” commands begin with an 
asterisk, followed by the command keyword. If param- 
eters are necessary, the keyword is followed by an 
equal sign ; then the parameters are entered and 
delimited by one of several special characters (the 
selection of the special characters is made by the user). 
These special characters include the following symbols: 

‘ \ V, t tt 7\ T, and In the commands 
descriptions that follow, the 7* 1S used. 
Most of the commands allow different forms of the 
parameters; however, each legal form is stated ex- 
plicitly, and no other form may be used. Within the 
parameter descriptions, the user substitutes the indi- 
cated quantity for lower-case items and supplies the 
operand exactly as shown for upper-case items. Several 
commands contain a quantity called “loc” as a param- 
eter. In general, this refers to a location within the 
user’s program; its actual use is described at the end 
of this section. 

The commands (with the preceding start symbol 
and the trailing, end-of-message symbol omitted) 
ollow. 

1 . *NAME = pgmname 

defines the linkage-editor-assigned member name of 
the program to be debugged. This program is LOADed 
from the data set defined by the SYSLIB DD card 
(or any overrides — see *DDN AMES command below). 
While LOADing the program, the debugger organizes 
the symbol table, if present, and writes it out on the 
data set defined by the SYSUTl DD card (also over- 
ridable — see the *DDNAMES command). The com- 
mand may be issued at any time ; if a previous program 
is in core, it is deleted, and the debugger reinitializes 
itself before LOADing the new program. 

2. *FINISH 
terminates the debugger. 

3. *PARM = parameter information 

sets up pointers so that the information following the 
equal sign is passed to the program according to normal 
OS standards.* 


* If the parameters are coded PARM = ‘XYZ’ on the EXEC 
card, the command should be *PARM — XYZ. 
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4. *DDNAME = syslib/ sysutl/sysprint 

causes the debugger to override, in its DCBs, the 
default name for the library data set, the symbol-table, 
the utility-work data set, and the data set to contain 
hardcopy output. The normal names are SYSLIB, 
SYSUT1, and SYSPRINT. However, as indicated 
previously, the user may need these names for his 
program’s execution. In this case, he may, using this 
command, override one, all, or any combination of 
these three names; e.g., if the user included a DD 
card name PRIVLIB instead of the SYSLIB card, 
he would issue *DDNAME = PRIVLIB. If he needed 
the name SYSUT1 and SYSPRINT for his program’s 
execution, he could include DD cards named A and B 
and issue the command *DDNAME = /A/B. To be 
effective, this command must be issued before the 
associated data set is needed; to issue a *NAME 
command followed by the *DDNAME would be 
meaningless unless the user wished to debug two 
programs from two different libraries. 

5. (1) *SETMODE = NEXT = ON 

(2) *SETMODE= NEXT = OFF 

causes the debugger to change its global mode setting. 
NEXT = ON tells the debugger to recognize the 
next *GO (or a null command) as a command to exe- 
cute the next instruction; in this way, the background 
program can be run one instruction at a time. NEXT= 
OFF resets this. 

6. *TRACE 

causes DYDE to print the current contents of the 
screen face into the SYSPRINT data set and, there- 
after, tp print each displayed line. If DYDE is tracing 
currently, *TRACE turns off tracing. 

7. *PRINT 

requests the debugger to copy everything displayed 
currently on the 2260 screen face into the SYSPRINT 
data set (this same is overridable — see the *DDNAME 
command). In this way, the user may keep a history 
of his debugging sessions and also develop a hardcopy 
trail of errors for later analysis. This command does not 
exist in the 1052 version. 

(1) *BREAK = name 

(2) *BREAK = name/DEL 


8. (3) *BREAK = /DEL 

(4) *BREAK — name/loc 

(5) *BREAK = name/loc/ verify string 

instructs the debugger to insert a breakpoint (cases 
1, 4, and 5) or delete a breakpoint (cases 2 and 3). In 
the former case, a breakpoint, with the given name, is 
inserted at a specified location. In case 1, it is inserted 
at the last displayed position: in case 4, at the named 
location; and in case 5, at the named location -after 
DYDE has verified that the supplied string (in hex) 
matches the information that is actually in core at 
that location. If the two strings do not match, the 
location is displayed, but no breakpoint is inserted 
nor is any other change made. Case 2 tells the debugger 
to delete the named breakpoint; and case 3 tells the 
debugger to delete the current breakpoint (if one 
exists). 

9. *GO 

instructs the debugger to execute (or resume) the 
current program. If this is the first *GO issued after 
an *NAME, the program begins at the link-editor- 
assigned entry point. If the program is halted currently 
at a breakpoint, control is resumed at the breakpoint’s 
location unless an PRESUME has modified this ad- 
dress. If the program has program checked (a specific 
type of 360 interrupt such as an invalid address speci- 
fication), the only way to resume it without reloading 
a fresh copy is thro agh the PRESUME. 

10. PRESUME = loc 

specifies that when program execution is restarted, the 
debugger should resume execution at the specified 
address rather than starting at the current breakpoint. 
This is the only way to resume a program that has 
program checked. Note that great care must be exer- 
cised when using this command to guarantee that 
registers and program cells are properly set so that 
another program check does not cocur. 

11. *DUMP 

tells the debugger to dump itself and the program 
as if an ABEND (an abnormal termination SVC with 
the code of 100) were located at the current break- 
point rather than the machine instruction, actually 
there. 
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12 . 


(1) ^MODIFY = ‘CONDV value 

(2) ^MODIFY = loc/value 

(3) *MODIFY = reg no/value 

(4) *MODIFY = value 

(5) ^MODIFY = loc/rep value/ verify 

value 


instructs the debugger to modify the program being 
debugged. In cases 1 and 3, the debugger modifies 
either the condition code set when the program resumes 
or the value of the specified register. For the condition 
code, the user supplies the mask as if he were testing 
it — *MODIFY==‘COND'/8 would cause the instruc- 
tion BC 8 to branch, whereas BC 7 would not. For 
the register, the hex digits supplied replace the same 
number of digits in the register — if register 3 contains 
ABCD1234 and if the command *MODIFY = 
#3/0000 were issued, the new value would be 00001234. 
In case 2, the specified location is modified by the 
supplied value; in case 5, the specified location is 
modified by the rep value, after comparing it with 
the verify value; and in case 4, the last displayed 
location is modified. All hex digits supplied are modi led 
in all cases; i.e., if location 1000 contained 47F0,1234 
and if the command *MODIFY=47AF were issued, 
the new value would be 47AF,1234. Note that in 
cases 2 and 4 the value supplied may contain imbedded 
commas. 

13. (1) *CSECT = loc 

(2) *CSECT 

defines a new context for the evaluation of expressions 
used for the loc parameters. In case 1, the location 
specified is used as the new base. Case 2 resets the 
program's base to the first byte of the load module. 

Several previous commands contain a location 
specification as a parameter (signified by loc in the 
command's syntax). Wherever this is required, the 
user may code the sum or difference of any combination 
of the following elements : 

1. ?hex value — hex displacement from the current 
base point (see *CSECT) ; 

2. &hex value — absolute displacement from the 
first addressable byte in the machine ; 

3. decimal value— decimal displacement from the 
current base point (see *CSECT) ; 

4. *-location of the current breakpoint; 

5. # followed by a register (i.e., #3); 

6. character string — absolute location of the 
specified symbol; 

7. any sum o: difference of the above enclosed in 


parentheses (no limit on the depth) — meaning 
the contents of the expression within the 
parentheses. 

Cases 1, 2, and 7 require further explanation. When 
the program is loaded initially, all displacements are 
evaluated with reference to the first byte of the load 
module. This is independent of the linkage -editor- 
assigned entry point. Thus, ?44 refers to 68 (decimal) 
bytes after the first byte of the load module. The 
*CSECT command may be used to modify this; i.e., 
if an *CSECT = ?44 is issued, the reference to ?44 
refers to a location 136 (decimal) from the entry point. 
In this way, the user may move from one control 
section to another without having to comput 3 dis- 
placement plus linkage editor-assigned control section 
address. This feature may be used when, for example, 
one program dynamically loads another. The user 
may plant a breakpoint just before the actual transfer 
of control, discover the location of the entry point of 
the LOADed program (it should be in a register), 
and plant a breakpoint there (perhaps using the 
*BREAK = /(# 15) command). When the second 
breakpoint is reached, the user may issue a *C3ECT 
= * command to set the context to the LOADed 
program.* 

Examples of valid loc parameters follow: 

1. (((&10))+4) would locate the current TGB 
(location x‘10' in the machine contains the 
address of the communications vector table; 
the first word points to a double word in core, 
and the second word contains the address of 
the current TCB). 

2 . If register 3 contained the value xT O' , 

((((#3))) +4) would accomplish the same thing. 
If cell C VTLOC in the user's program contained 
the value xTO', ((((CVTLOC)))+4) would also 
locate the current TCB. 

3. SAVE+4 should specify a location 4 bytes 
after the symbol # SAVE. 

4. (#15) would specify the location pointed to 
by register 15. 

The other general category of commands requests 
displays of items or status about the program being 
debugged. These do not begin with an asterisk followed 
by a keyword, but are merely commands that specify 
what is to be displayed. These commands follow: 


* In this case, the symbol table is unavailable for the LOADed 
program. 
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1. (1) ‘R* 

(2) # followed by register number 

requests the debugger to display either the contents of 
all registers (case 1) or only the specified register 
(case 2). For the 2260 version of DYDE, either one 
line is written for a single register display or four lines, 
each containing the contents of four registers, are 
written. For the 1052 version, case 1 calls for writing 
three messages to the operator: (without reply) for 
registers 0 through 11 and one WTOR (which forms 
the basis for the next command) for the remaining 
four registers. In either case and for either version, the 
registers are displayed as they were at the last break- 
point, including any subsequent; manual modification 
(or all zero if the program has not; yet begun execution) . 

2. ‘COND* 

requests a display of the curren-tcondition code as a 
decimal value between 0 and 8;; i.e., if the condition 
code is displayed as 8, a BC 8 will branch but a BC 7 
will not. 

3. ‘BREAK* 

requests a display of the current breakpoint infor- 
mation. All data regarding currently active breakpoints 
are displayed as well as identification of the current 
breakpoint. 

(1) loc 

4. (2) loc/length 

(3) loc//modifier 

(4) loc/length/ modifier 

causes the display of a particular location (see the loc 
parameter discussion above), and defines a ‘current* 
location to be used if the next *MODIFY or *BREAK 
does not specify an explicit one. If no length or modifier 
information is supplied and the loc specification con- 
tains no symbol, a 4-byte hexadecimal value is dis- 
played. If a symbol is present, its length and type 
attributes are used. A length, which must be a decimal 
less than 32, determines how many digits will form 
the final display. The modifier may be C, B, or R or 
it may be omitted. If C is coded, the value will be 
displayed as characters; B requests the display as a 
bit string of ones and zeros ; and R requests a display 
relative to the current base point. However, if R is 
qualified by some value in parentheses (e.g., loc//R 


(BASE2)), the displayed value is relative to the value 
of BASE2. 

One other command to DYDE exists: the asynchron- 
ous interrupt to the user*s executing program. After 
a user has indicated his desire to resume execution of 
his program, DYDE does not receive control again 
until another breakpoint is encountered. However, if 
the user provides an asynchronous interrupt (by 
simultaneously depressing the enter and shift keys on 
the 2260), DYDE is given control by 03, interrupting 
the program being debugged (which is currently 
executing). DYDE plants a breakpoint where the 
program will resume and then terminates interrupt 
processing. When OS resumes the program, this break- 
point is executed, and DYDE is entered. In this manner, 
the user, after requesting resumption of his program, 
may interrupt it from the console and use all of DYDE*s 
facilities. 

Symbol table 

To allow the user to make symbolic references to 
his program, DYDE uses the OS TESTRAN facility 
to provide a symbol table. The assemblers test option 
tells it to provide the symbol table as part of its output 
object module. Similarly, the linkage-editors test 
option tells it to write a composite symbol table (a 
concatenation of each symbol table present in the 
input load or object modules) along with the load 
module. Under normal processing this symbol table 
is ignored; i.e., when a load module is brought into 
core, the symbol table is stripped off. However, before 
loading a program in response to an *NAME command, 
DYDE checks the disk data set containing the program 
for a symbol table. If the load module on the disk 
does not contain symbol table entries, it is simply 
loaded into core, and the user is informed that symbols 
are not available. 

However, if symbol table entries are present, they 
are read into core; an index is built through a hash 
technique; and they are written into the SYSUTT 
data set. Each symbol used is present along with its 
attributes of type and length and its displacement.- 
The composite external symbol dictionary (CESD) 
of control sections, produced by the linkage-editor, 
is used to build a map of the program so that each 
symbol may be assigned an address relative to the 
load point rather than a displacement from its control- 
section origin. As each symbol is retrieved, the first 
four characters are multiplied by the last four, and 
the middle seven bits of the resulting 64-bit product 
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are used to index a 128-entry hash table. Each table 
entry contains an index to a block of data on external 
storage and a displacement within that block. All 
symbols with the same hash entry are chained together, 
each pointing to the block and displacement of the 
next symbol. Each block contains enough space for 
200 symbols; the most recently referenced block is 
kept in core to minimize disk accesses. This method 
seems to work efficiently for the on-line user expecting 
rapid response. 

User SVC 

One major deficiency of the 3600 hardware, which 
any debugging system must overcome, is the require- 
ment that any transfer of control be accompanied by 
the setting (and the destruction) of one of the sixteen 
general-purpose registers. Thus, the transfer of control 
from the debugged program at breakpoints cannot be 
accomplished merely by a branch, but must be per- 
formed by an instruction that is independent of register 
settings. The most likely candidate is a supervisor 
call (SVC) and its associated supervisor call routine, 
which can arrange for saving all sixteen registers and 
the transfer of control. However, the modification of 
the user’s program when such an SVC is inserted to 
represent a breakpoint requires that destroyed in- 
structions be executed interpretively out of line, if 
the breakpoint is to be used in the future. This is 
quite expensive since approximately 120 instructions 
are in the 360 repertoire, and each one’s interpretation 
must be coded separately. Using the EXECUTE 
instruction to execute the one modified instruction 
out of line is another possibility. However, this requires 
that all sixteen registers be properly set before the 
EXECUTE instruction is issued, and that control be 
transferred to the next instruction in the program 
without destroying any register contents. 

To solve this problem, DYDE employs a type III 
user-written supervisor call that allows both DYDE 
and the program to be debugged to reside as “co- 
routines” in the same job. This SVC can be viewed 
from the outside as having a pingpong effect on the 
control flow. Each time the SVC is issued, after an 
initial call, control is passed to the other co -routine; 
i.e., the first call passes to the SVC routine an address 
within DYDE for register and program-status-word 
(PSW) save areas, one for itself and one for the program 
being debugged. Thereafter, each issuance' saves the 
registers and PSW of the issuing co -routine in its 
area and restores the registers and PSW of the other 
member of the pair. Thus, each breakpoint inserted 
in the program being debugged calls for DYDE to 
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lift and save the current instruction at that location 
and to plant the two-byte SVC. When the SVC is 
executed, control passes to DYDE at an entry point 
specified by it; a note is made of the location where 
the SVC was issued. When the user indicates he wants 
his program resumed, the lifted instruction is moved 
into a special area in DYDE; the program’s resume 
address is updated to point to this location; and DYDE 
issues the SVC. This causes the program’s registers 
to be restored and control to be passed to the lifted 
instruction. If it is a branch, control passes directly 
back to the program. However, if it is not a branch, 
control will pass to the next instruction in this special 
area, which happens to be another pingpong SVC call. 
Since it was issued while the program was in execution, 
control is passed to DYDE, which notes that the SVC 
was issued from within its own address space and 
that the lifted instruction dropped through. DYDE 
then calculates the address of the instruction following 
the lifted instruction, places it in the program’s resume- 
program-status word, and reissues the SVC. This 
causes control to return to the program, which remains 
in control until another breakpoint is reached (see 
Figure 3) . The only instructions that cannot be executed 
when moved are the Branch and Link and the Branch 
and Link Register, which are location dependent — they 
load a specific register with the current contents of the 
location counter and then branch to another location. 
DYDE interprets both instructions. 

APPENDIX A 
1052 Operation 

The 1052 is the normal OS operator’s console. DYDE 
uses the Write to Operator (WTO) and the Write 
to Operator with Reply (WTOR) facilities to com- 
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municate with the user. These macros allow any 
program to type a message on; the typewriter, or to 
type a message and wait for a reply. This facility 
provides a very rudimentary form of interaction; not 
only is the typewriter slow, but; the form of user com- 
mands is, of necessity, burdensome. More impor- 
tantly, the console is used by OS for communications 
with the operator. As such, it types out not only 
declarative but informative messages and expects 
some replies. Thus, a user wishing to use DYDE on 
a 1052 must tolerate other console activity; separate 
those messages sent to him by DYDE from other 
operator messages, usually by noting the message 
content; and tag his commands with the number of 
the message to which he is replying. 

The mechanism for these replies is bothersome. The 
user first depresses the REQUEST key, then, when 
the system responds with the proceed light, he must 
type the character R (short for REPLY), leave a 
space, and then type the followjing: (1) the number of 
the outstanding message to wljich he is replying, (2) 
a quote, (3) the message body, (4) a terminal quote, 
and (5) the end of block. Assuming the user has re- 
ceived a proceed light, and is ireplying to message 3, 
he must type: 

R 03, THIS IS AN EXAMPLE/ 

followed by an end of block. 

Using this operation, DYDE initially types out a 
READY message and waits for a reply. The user 


responds to this message using the reply mechanism — 
by issuing a legal command, and being careful to note 
the number (or tag) associated with the READY 
message. DYDE responds to each request with a 
message. If the request requires more than one line, 
at least one WTO is issued, with no wait for reply; 
it is followed by a WTOR and a wait for reply. In 
this manner, DYDE can debug a program that resides 
as one of many jobs in a multiprogrammed environ- 
ment, and still keep the interference with normal 
system operations at a minimum. 


APPENDIX B 
Command abbreviations 

The following command abbreviations are available: 


Abbreviation 

Full Form 

*NA 

♦NAME 

*M 

♦MODIFY 

*BR 

♦BREAK 

*FI 

♦FINISH 

*DD 

♦DDNAMES 

*CS 

♦CSECT 

*RE 

♦RESUME 

*TR 

♦TRACE 

*S 

♦SETMODE 

null command 

if mode is next, then 

(i.e., just the 

♦NEXT if mode is 

enter symbol) 

not next, then *GO 


The multics PL/I compiler 


by R. A. FREIBURGHOUSE 

General Electric Company 
Cambridge, Massachusetts 


INTRODUCTION 

The Multics PL/1 compiler is in many respects a 
“second generation” PL/1 compiler. It was built at a 
time when the language was considerably more stable 
and well defined than it had been when the first 
compilers were built. 1 - 2 It has benefited from the 
experience of the first compilers and avoids some of the 
difficulties which they encountered. The Multics com- 
piler is the only PL/1 compiler written in PL/1 and is 
believed to be the first PL/1 compiler to produce high 
speed object code. 

The language 

The Multics PL/1 language is the language defined 
by the IBM “PL/1 Language Specifications” dated 
March, 1968. 1 At the time this paper was written most 
language features were implemented by the compiler 
but the run time library did not include support for 
input and output, as well as several lesser features. 
Since the multi-tasking primitives provided by the 
Multics operating system were not well suited to PL/1 
tasking, PL/1 tasking was not implemented. Inter- 
process communication (Multics tasking) may be 
performed through calls to operating system facilities. 

The system environment 

The compiler and its object programs operate within 
the Multics operating system. 3 - 4 - 6 The environment 
provided by this system includes a virtual two dimen- 
sional address space consisting of a large number of 
segments. Each segment is a linear address space whose 
addresses range from 0 to 64K. The entire virtual store 
is supported by a paging mechanism. which is invisible 


to the program. Each program operating in this 
environment consists of two segments: a text segment 
containing a pure re-entrant procedure, and a linkage 
segment containing out-references (links), definitions 
(entry names), and static storage local to the program. 
The text segment of each program is sharable by all 
other users on the system. Linking to a called program is 
normally done dynamically during program execution. 

Implementation techniques 

The entire compiler and the Multics operating system 
were written in EPL, a large subset of PL/1 containing 
most of the complex features of the language. The EPL 
compiler was built by a team headed by M. D. M cllroy 
and R. Morris of Bell Telephone Laboratories. Several 
members of the Multics PL/1 project modified the 
original EPL compiler to improve its object code 
performance, and utilized the knowledge acquired from 
this experience in the design of the Multics PL/1 
compiler. EPL and Multics PL/1 are sufficiently 
compatible to allow the Multics PL/1 compiler to 
compile itself and the operating system. 

The Multics PL/1 compiler was built and de-bugged 
by four experienced system programmers in 18 months. 
All program preparation was done on-line using the 
CTSS time-sharing system at MIT. Most de-bugging 
was done in a batch mode on the GE645, but final 
de-bugging was done on-line using Multics. 

The extremely short development time of 18 months 
was made possible by these powerful tools. The same 
design programmed in a macro-assembly language using 
card input and batched runs would have required twice 
as much time, and the result would have been extremely 
unmanageable. 
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Design objectives 

The project’s design decisions and choice of techniques 
were influenced by the following objectives: 

1. A correct implementation of a reasonably 
complete PL/1 language. 

2. A compiler which produced relatively fast object 
code for all language constructs. For similar 
language constructs, the object code was ex- 
pected to equal or exceed that produced by most 
Fortran or COBOL compilers. 

3. Object program compatibility with EPL object 
programs and other Multics languages. 

4. An extensive compile time diagnostic facility. 

5. A machine independent compiler capable of 
bootstrapping itself onto other hardware. 

The compiler’s size and speed were considered less 
important than the above mentioned objectives. Each 
phase of the original compiler Occupies approximately 
32K, but after the compiler has compiled itself that 
figure will be about 24K. The original compiler was 
about twice as slow as the Multics Fortran compiler. 
The bootstrapped version of the PL/1 compiler is 
expected to be considerably faster than the original 
version but it will probably not equal the speed of 
Fortran. 

An overview of the compiler 

The Multics PL/1 compiler is designed along 
traditional lines. It is not an interactive compiler nor 
does it perform partial compilations. The compiler 
translates PL/1 external procedures into relocatable 
binary machine code which may be executed directly or 
which may be bound together with other procedures 
compiled by any Multics language processor. 

The notion of a phase is particularly useful when 
discussing the organization of the Multics PL/1 
compiler. A phase is a set of procedures which performs 
a major logical function of compilation, such as syntac- 
tic analysis. A phase is not necessarily a memory load or 
a pass over some data base although it may, in some 
cases, be either or both of these things. 

The dynamic linking and paging facilities of the 
Multics environment have the effect of making avail- 
able in virtual storage only those specific pages of those 
particular procedures which are referenced during an 
execution of the compiler. A phase of the Multics PL/1 
compiler is therefore only a logical grouping of pro- 
cedures which may call each other. The PL/1 compiler 
is organized into five phases: Syntactic Translation, 
Declaration Processing, Semantic Translation, Optimi- 
zation, and Code Generation. 


The internal representation 

The internal representation of the program being 
compiled serves as the interface between phases of the 
compiler. The internal representation is organized into 
a modified tree structure (the program tree) consisting 
of nodes which represent the component parts of the 
program, such as blocks, groups, statements, operators, 
operands, and declarations. Each node may be logically 
connected to any number of other nodes by the use of 
pointers. 

Each source program block is represented in the 
program tree by a block node which has two lists 
connected to it: a statement list and a declaration list. 
The elements of the declaration list are symbol table 
nodes representing declarations of identifiers w ithin that 
block. The elements of the statement list are nodes 
representing the source statements of that block. Each 
statement node contains the root of a computation tree 
which represents the operations to be performed by that 
statement. This computation tree consists of operator 
nodes and operand nodes. 

The operators of the internal representation are 
n-operand operators whose meaning closely parallels 
that of the PL/ 1 source operators. The form of an 
operand is changed by certain phases, but operands 
generally refer to a declaration of some variable or 
constant. Each operand also serves as the root of a 
computation tree which describes the computations 
necessary to locate the item at run time. 

This internal representation is machine independent 
in that it does not reflect the instruction set, the 
addressing properties, or the register arrangement of 
the GE645. The first four phases of the compiler are also 
machine independent since they deal only with this 
machine independent internal representation. Figure 1 
shoves the internal representation, of a simple program. 

Syntactic translation 

Syntactic analysis of PL/1 programs is slightly more 
difficult than syntactic analysis of other languages such 
as Fortran. PL/1 is a larger language containing more 
syntactic constructs, but it does not present any 
significantly new 7 problems. The syntactic translator 
consists of two modules called the lexical analyzer and 
the parse. 

Lexical analysis 

The lexical analyzer organizes the input text into 
groups of tokens w'hich represent a statement. It also 
creates the source listing file and builds a token table 
wiiich contains the source representation of all tokens in 
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FACT! PROC; F|XED PR|NT ENTRY) f ENTR y returns(fixed) int; 

DO I * 1 TO 10; 

CALL PRINK "Factorial ls,F(I)); 

END; 

F: PROC (N) FIXED; 

DCL N FIXED; symbol table 

IF N - 0 THEN RETURN (1); f J r i 

RETURN (N*F(N-1 )); 

END F; symbol table 

END FACT; --for PRINT 


block node 
FACT 


/ 


statement node 
for DO 


statement node 
for CALL 


statement node 
for DO end 


statement node 
for FACT end 


symbol table 
^ for F 


"block node 

/' 

statement node^ 
for IF clause^ 


statement node 
for THEN clause- 


symbol table 
^for N 


* iump.ne. 

/ 1 \ 

£ N 0 


e statement node 
for RETURN 


statement node 
for F end 



Figure 1 — The internal repre^ertation of a program. 
The example is greatly simplified. Only the state- 
ments of procedure F are shown in detail. 


PRI NT: PROC (MESSAGE, VALUE); 

DCL MESSAGE CHAR (*) , VALUE FIXED; 
CALL DISPLAY (MESSAGE II VALUE); 
END; 


The token table produced by 
the lexical analyzer for 
this program is: 



Figure 2 — The output of the lexical analyzer. 


the source program. A token is an identifier, a constant, 
an operator or a delimiter. The lexical analyzer is called 
by the parse each time the parse wants a new statement. 

The lexical analyzer is an approximation to a finite 
state machine. Since the lexical analyzer must produce 
output as well as recognize tokens, action codes are 
attached to the state transitions of the finite state 
machine. These action codes result in the concatenation 
of individual characters from the output until a 
recognized token is formed. Constants are not converted 
to their internal format by the lexical analyzer. They are 
converted by the semantic translator to a format which 
depends on the context in 'which the constant appears. 

The token table produced by the lexical analyzer 
contains a single entry for each unique token in the 
source program. Searching of the token table is done 
utilizing a hash coded scheme which provides quick 
access to the table. Each token table entry contains a 
pointer which may eventually point to a declaration of 
the token. For each statement, the lexical analyzer 
builds a vector of pointers to the tokens which were 
found in the statement. This vector serves as the input 
to the parse. Figure 2 shows a simple example of lexical 
analysis. 


The parse 

The parse consists of a set of possibly recursive 
procedures, each of which corresponds to a syntactic 
unit of the language. These procedures are organized to 
perform a top down analysis of the source program. As 
each component of the program is recognized, it is 
transformed into an appropriate internal representation. 
The completed internal representation is a program tree 
which reflects the relationships between all of the 
components of the original source program. Figure 3 
shows the results of the parse of a simple program. 

Syntactic contexts which yield declarative informa- 
tion are recognized by the parse, and this information is 
passed to a module called the context recorder which 
constructs a data base containing this information. 
Declare statements are parsed into partial symbol table 
nodes which represent declarations. 

The problem of backup 

The top down method of syntactic analysis is used 
because of its simplicity and flexibility. The use of a 
simple statement recognition algorithm made it possible 





190 Fall Joint Computer Conference, 1969 


SUM: PROC(X,N) FLOAT; 

DCL (S INITIAL(0),X(1000)) FLOAT; 

DCL ( I , N) FIXED; tabl ,f 

DO I » 1 TON; tor N 

S * S+X ( I ) ; 

Efgo* symbol table 

RETURN.(S); I 

symbol table 

J? r X 


END SUM; 


symbol table 
for S 


block noi 
SUM 




/ 


The token table 


statement node 



creates new declarations having the same format as 
those derived from declare statements. This activity 
creates contextual and implicit declarations. 

The declaration processor 

The declaration processor develops sufficient informa- 
tion about the variatbs of the program so that they 
may be allocated storage, initialized and accessed by the 
program’s operators. It is organized to perform three 
major functions: the preparation of accessing code, the 
computation of each variable’s storage requirements, 
and the creation of initialization code. 

The declaration processor is relatively machine 
independent. All machine dependent characteristics, 
such as the number of bits per word and the alignment 
requirements of data types, are contained in a table. 
All computations or statements produced, by the 
declaration processor have the same internal representa- 
tion as source language expressions or statements. Later 
phases of the compiler do not distinguish between them. 

The use of based references by the declaration 
pro essor 


to eliminate all backup. The statement recognizer 
identifies the type of each statement before the parse of 
that statement is attempted. The algorithm used by 
this procedure first attempts t6 recognize assignment 
statements using a left to right scan which looks for 
token patterns which are roughly analogous to X = or 
X ( ) = . If a statement is not recognized as an 

assignment, its leading token is matched against a 
keyword list to determine thq statement type. This 
algorithm is very efficient and is able to positively 
identify all legal statements without requiring keywords 
to be reserved. 

Declaration 'processing 

PL/1 declaration processing is complicated by the 
great variety of data attributes and by the context 
sensitive manner in which they are derived. Two 
modules, the context processor and the declaration 
processor, process declarative information gathered by 
the parse. 

The context processor 

The context processor scans the data base containing 
contextually derived attributes produced during the 
parse by the context recorder. |It either augments the 
partial symbol table created from declare statements or 


The concept of a based reference is useful to the 
understanding of PL/1 data accessing and the imple- 
mentation of a number of language features. A based 
declaration of the form DCL A BASED is referenced 
by a based reference of the form P — » A, where P is a 
pointer to the storage occupied by a value whose 
description is given by the declaration of A. Multiple 
instances of data having the characteristics of A can be 
referenced through the use of unique pointers, i.e., 
Q A, R — > A, etc. 

The declaration processor implements a number of 
language features by transforming them into suitable 
based declarations. Automatic data whose size is 
variable is transformed into a based declaration. 

For example the declaration: 

DCL A(N) AUTO; 

becomes 

DCL A(N) BASED (P); 

where: P is a compiler produced pointer which is set 
upon entry to the declaring block. 

Based declarations are also used to implement 
parameters. For example. 

X: PROC (C); DCL C; 
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becomes 

X: PROC (P); DCL C BASED (P); 

where: P is a pointer which points to the argument 
corresponding to the parameter C. 

Data accessing 

The address of an item of PL/1 data consists of three 
basic parts : a pointer to some storage location, a word 
offset from that location and a bit offset from the word 
offset. Either or both offsets may be zero. The term 
“word” is understood to refer to the addressable unit 
of a computer’s storage. 

Example 1 

DCL A AUTO; 

The address of A consists of a pointer to the declaring 
block’s automatic storage, a word offset within that 
automatic storage and a zero bit offset 

Example 2 

DCL 1 S BASED (P), 

2 A BIT(5), 

2 B BIT(N) 

When referenced by P — * B, the address of B is a 
pointer P, a zero word offset and a bit offset of 5. The 
word offset may include the distance from the origin of 
the item’s storage class, as was the case with the first 
example, or it may be only the distance from the 
level-one containing structure, as it was in the last 
example. The term “level-one” refers to all variables 
which are not contained within structures. Subscripted 
array element references, A(K, J), or sub-string 
references, SUBSTR(X, K, J), may also be expressed 
as offsets. 

Offset expressions 

The declaration processor constructs offset expres- 
sions which represent the distance between an element 
of a structure and the data origin of its level-one 
containing structure. If an offset expression contains 
only constant terms, it is evaluated by the declaration 
processor and results in a constant addressing offset. If 
the offset expression contains variable terms, the 
expression results in the generation of accessing 
instructions in the object program. The discussion which 
follows describes the efficient creation of these offset 
expressions. 


Given a declaration of the form: 

DCL 1 S, 

2 A BIT(M), 

2 B BIT (5), 

2 C FLOAT; 

The offset of A is zero, the offset of B is M bits, and the 
offset of C is M + 5 bits rounded upward to the 
nearest word boundary. 

In general, the offset of the nth item in a structure is : 

bw(c n -i(s w _i) + b n _i(c w _ 2 (s w _ 2 ) + b w _ 2 
(• • -b 3 (c 2 (s 2 )) + b 2 (ci(si)))* • •))) 

where: b k is a rounding function which expresses the 
boundary requirement of the kth item. 

s k is the size of the kth item. 

c k is the conversion factor necessary to convert 

s k to some common units such as bits. 

The declaration processor suppresses the creation of 
unnecessary conversion functions (c*) and boundary 
functions (b k ) by keeping track of the current units and 
boundary as it builds the expression. As a result the 
offset expressions of the previous example do not contain 
conversion functions and boundary functions for A 
and B. 

During the construction of the offset expression, the 
declaration processor separates the constant and varia- 
ble terms so that the addition of constant terms is done 
by the compiler rather than by accessing code in the 
object program. The following example demonstrates 
the improvement gained by this technique. 

DCL 1 S, 

2 A BIT (5), 

2 B BIT(K), 

2 C BIT(6), 

2 D BIT(IO) ; 

The offset of D is K+ll instead of fi+K+6. 

The word offset and the bit offset are developed 
separately. Within each offset, the constant and varia- 
ble parts are separated. These separations result in the 
minimization of additions and unit conversions. If the 
declaration contains only constant sizes, the resulting 
offsets are constant. If the declaration contains expres- 
sions, then the offsets are expressions containing the 
minimum number of terms and conversion factors. 

The development of size and offset expressions at 
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compile time enables the object program to access data 
without the use of data descriptors or “dope vectors.” 6 
Most existing PL/1 implementations make extensive 
use of such descriptors to access data whose size or 
offsets are variable. Unless these descriptors ^are 
implemented by hardware, theijr use results in rather 
inefficient object code. The Multics PL/1 strategy of 
developing offset expressions from the declarations 
results in accessing code similar to that produced for 
subscripted array references. This code is generally 
more efficient than code which uses descriptors. 

In general, the offset expressions constructed by the 
declaration processor remain unchanged until code 
generation. Two cases are exceptions to this rule: 
subscripted array references, A(K,J ), and sub-string 
references, SU BSTR(X,K,J). Each subscripted 
reference or sub-string reference is a reference to a 
unique sub-datum within the j declared datum and, 
therefore, requires a unique j offset. The semantic 
translator constructs these unique offsets using the 
subscripts from the reference ancj the offset prepared by 
the declaration processor. 

Allocation 

The declaration processor do0s not allocate storage 
for most classes of data, but it does determine the 
amount of storage needed by eich variable. Variables 
are allocated within some segment of storage by the code 
generator. Storage allocation is delayed because, during 
semantic translation and optimization, additional dec- 
larations of constants and compiler created variables 
are made. 

Initialization 

The declaration processor creates statements in the 
prologue of the declaring block which will initialize 
automatic data. It generates! DO statements, IF 
statements and assignment statements to accomplish 
the required initialization. I 

The expansion of the initial attribute for based and 
controlled data is identical to that for automatic data 
except that the required statements are inserted into 
the program at the point of allocation rather than in the 
prologue. 

Since array bounds and string: sizes of static data are 
required by the language to be constant, and since all 
values of the initial attribute bf static data must be 
constant, the compiler is able to initialize the static data 
at compile time. The initialization is done by the code 
generator at the time it allocates the static data. 


Semantic translation 

The semantic translator transforms the internal 
representation so that it reflects the attributes (seman- 
tics) of the declared variables without reflecting the 
properties of the object machine. It makes a single scan 
over the internal representation of the program. A com- 
piler, which had no equivalent of the optimizer phase 
and which did not separate the machine dependencies 
into a separate phase, could conceivably produce object 
code during this scan. 

Organization of the semantic translator 

The semantic translator consists of a set of recursive 
procedures which walk through the program tree. The 
actions taken by these procedures are described by the 
general terms: operator transformation and operand 
processing. Operator transformation includes the crea- 
tion of an explicit representation of each operator’s 
result and the generation of conversion operators for 
those operands which require conversion. Operand 
processing determines the attributes, size and offsets of 
each operator’s operands. 

Operator transformation 

The meaning of an operator is determined by the 
attributes of its operands. This meaning specifies which 
conversions must be performed on the operands, and it 
decides the attributes of the operator’s result. 

An operator’s result is represented in the program 
tree by a temporary node. Temporary nodes are a 
further qualification of the original operator. For 
example, an add operator whose result is fixed-point is a 
distinct operation from an add operator whose result is 
floating-point. There is no storage associated with 
temporaries— they are allocated either core or register 
storage by the code generator. A temporary’s size is a 
function of the operator’s meaning and the sizes of the 
operator’s operands. A temporary, representing the 
intermediate result of a string operation, requires an 
expression to represent its length if any of the string 
operator’s operands have variable lengths. 

Operand processing 

Operands consist of sub-expressions, references to 
variables, constants, and references to procedure names 
or built-in functions. Sub-expression operands are 
processed by recursive use of operator transformation 
and operand processing. Operand processing converts 
constants to a binary format which depends on the 
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context in which the constant was used. References to 
variables or procedure names are associated with their 
appropriate declaration by the search function. After 
the search function has found the appropriate declara- 
tion, the reference may be further processed by the 
subscriptor or function processor. 

The Search function 

During the parse, it is not possible for references to 
source program variables to know the declared attributes 
of the variable because the PL/1 language allows 
declarations to follow their use. Therefore, references to 
source program variables are parsed into a form which 
contains a pointer to a token table entry rather than to 
a declaration of the variable. Figure 3 shows the output 
of the parse. The search function finds the proper 
declaration for each reference to a source program 
variable. The effectiveness of the search depends heavily 
on the structure of the token table and the symbol table. 
After declaration processing, the token table entry 
representing an identifier contains a list of all the 
declarations of that identifier. See Figure 4. 

The search function first tries to find a declaration 
belonging to the block in which the reference occurred. 
If it fails to find one, it looks for a declaration in the next 
containing block. This process is repeated until a 


TOPj PROC; 

DCL B POINTER; 

BEGIN; 

DCL B FLOAT; 

BEGIN; 

DCL B FIXED; 


END; 


END; 


END; 


Token Table 



second BEGIN 


Figure 4 — The relationship between the token table and 
the symbol table 


DEM i PROC; 


DCL I S, 

2 A{N) FLOAT, 
2 B(M) FIXED; 


END; 


\ 


block node 
for DEM 


symbol 
' for B 


\ 


table 


\reference 
node for B 



the word offset 
express ion bu i It 
by the declaration 
processor. 


statement node 
for assignment 


\ 

% 



node for B 
/ 


,+K ’I 


the word offset 
expression built 
by the semantic 
translator. 


Figure 5 — A simplified diagram showing the effects of 
subscripting 


declaration is found. Since the number of declarations 
on the list is usually one, the search is quite fast. In its 
attempt to find the appropriate declaration, the search 
function obeys the language rules regarding structure 
qualification. It also collects any subscripts used in the 
reference and places them into a subscript list. Depend- 
ing on the attributes of the referenced item, the 
subscript list serves as input to the function processor or 
subscriptor. 

The declaration processor creates offset expressions 
and size expressions for all variables. These expressions, 
known as accessing expressions, are rooted in a reference 
node which is attached to a symbol table node. The 
reference node contains all information necessary to 
access the data at run time. The search function 
translates a source reference into a pointer to this 
reference node. See Figure 5. 

Subscripting 

Since each subscripted reference is unique, its offset 
expression is unique. To reflect this in the internal 
representation, the subscriptor creates a unique refer- 
ence node for each subscripted reference. See Figure 6. 
The following discussion shows the relationship between 
the declared array bounds, the element size, the array 
offset and subscripts. 
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Let us consider the case of an aiiray declared: 

a(h:ui, l 2 :u 2 , , l n :\i n ) 

Its element size is s and its offset is b. 

The multipliers for the array are defined as: 

m n = s 

— (u n — In + l)s 
™-n-2 = (U w _l — In— 1 + l)m w _l 

mi = (u 2 — 1 2 + l)m 2 

The offset of a reference a(ii, i 2 , • • , i ?l ) is computed as: 


compiler. Since the virtual origin and the multipliers are 
common to all references, they are constructed by the 
declaration processor and are repeatedly used by the 
subscriptor. 

Arrays of PL/1 structures which contain arrays may 
result in a set of multipliers whose units differ. The 
declaration : 

DCL 1 S(10), 

2 A PTR, 

2 B(10) BIT (2); 

yields two multipliers of different units. The first 
multiplier is the size of an element of S in words, while 
the second multiplier is the size of an element of B 
in bits. 


n 

V + E ijijn,- 

3=1 

where: v is the virtual origin. The virtual origin is the 
offset obtained by setting the subscripts equal to zero. 
It serves as a convenient base from which to compute 
the offset of any array element, j 

During the construction ofj all expressions, the 
constant terms are separated from the variable terms 
and all constant operations are performed by the 

| 

fig j proc; 

DCL (x, x z) float; ! 

x-y+Z; | 

END; j 


symbol table 



Array parameters which may correspond to an array 
cross section argument must receive their multipliers 
from an argument descriptor. Since the arrangement 
of the cross section elements in storage is not known to 
the called program, it cannot construct its own multi- 
pliers and must use multipliers prepared by the calling 
program. Note that the current definition of PL/1 
allows any array parameter to receive a cross section 
argument. 

The function processor 

An operand which is a reference to a procedure is 
expanded by the function processor into a call operator 
and possible conversion operators. Built-in function 
references result in new operators or are translated into 
expressions consisting of operators and operands. 

Generic procedure references 

A generic entry name represents a family of pro- 
cedures whose members require different types of 
arguments. 

DCL ALPHA GENERIC (BETA 

ENTRY (FIXED)), 
GAMMA 

ENTRY (FLOAT) ) ; 

A reference to ALPHA (. X ) will result in a call to 
BETA or GAMMA depending on the attributes of X. 

The declaration processor chains together all members 
of a generic family and the function processor selects the 
appropriate member of the family by matching the 
arguments used in the reference with the declared 
argument requirements of each member. When the 
appropriate member is found, the original reference is 
replaced by a reference to the selected member. 
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Argument processing 

The function processor matches arguments to user- 
declared procedures against the argument types required 
for the procedure. It inserts conversion operators into 
the program tree where appropriate, and it issues 
diagnostics when it detects illegal cases. 

The return value of a function is processed as if it 
were the n+ 1th argument to the procedure, eliminating 
the distinction between subroutines and functions. 

The function processor determines which arguments 
may possibly correspond to a parameter whose size or 
array bounds are not specified in the called procedure. 
In this case, the argument list is augmented to include 
the missing size information. A more detailed description 
of this issue is given later in the discussion of object 
code strategies. 

The built-in function processor 

The built-in function processor is basically a table 
driven device. The driving table describes the number 
and kind of arguments required by each function and is 
used to force the necessary conversions and diagnostics 
for each argument. Most functions require processing 
which is unique to that function, but the table driven 
device minimizes the amount of this processing. 

The SVBSTR built-in function is of particular 
importance since it is a basic PL/1 string operator. It is 
a three argument function which allows a reference to 
be made to a portion of a string variable, i.e., 
SUBSTR (X, J, J) is a reference to the ith through 
i + j — 1th character (or bit) in the string X. 

This function is similar to an array element reference 
in the sense that they both determine the offsets of the 
reference. The processing of the SUBSTR function 
involves adjusting the offset and length expressions 
contained in the reference node of X. As is the case in 
all compiler operations on the offset expressions, the 
constant and variable terms are separated to minimize 
the object code necessary to access the data. 

The optimizer 

The compiler is designed to produce relatively fast 
object code without the aid of an optimizing phase. 
Normal execution of the compiler will by-pass the 
optimizer, but if extensively optimized object code is 
desired, the user may set a compiler command option 
which will execute the optimizer. The optimizer consists 
of a set of procedures which perform two major optimi- 
zations: common sub-expression removal and remova 1 
of computations from loops. The data bases necessary 


for these optimizations are constructed b;y the parse 
and the semantic translator. These data bases consist of 
a cross-reference structure of statement labels and a 
tree structure representing the DO groups of each 
block. Both optimizations are done on a block basis 
using these two data bases. 

Although the optimizer phase was not implemented 
at the time this paper was written, all data bases 
required by the optimizer are constructed by previous 
phases of the compiler and the abnormality of all 
variables is properly determined. 

Optimization of PL/I programs 

The on-condition mechanism of the PL/1 language 
makes the optimization of PL/1 programs considerably 
more difficult than the optimization of Fortran pro- 
grams. Assuming that an optimized version of a 
program should yield results identical to those produced 
by the un-optimized version, then if any on-conditions 
are enabled in a given region of the program, the 
compiler cannot remove or reorder the computations 
performed in that region. (Consider the case of a divide 
by zero on unit which counts the number of times that 
the condition occurs.) 

Since some on-conditions are enabled by default, 
most PL/1 programs cannot be optimized. Because of 
the difficulty of determining the abnormality of a 
program’s variables, the optimization of those programs 
which may be optimized requires a rather intelligent 
compiler. A variable is abnormal in some block if its 
value can be altered without an explicit indication of 
that fact present in that block. An optimizing PL/1 
compiler must consider all based variables, all arguments 
to the A DDR function, all defined variables, and all 
base items of defined variables to be abnormal. If the 
compiler expects values of variables to be retained 
throughout the execution of a call, it must also consider 
all parameters, all external variables, and all arguments 
of irreducible functions to be abnormal. 

Because of the difficulty of optimizing programs 
written in the current PL/1 language 1 compilers should 
probably not attempt to perform general optimizations 
but should concentrate on special case optimizations 
which are unique to each implementation. Future 
revisions to the language definition may help solve the 
optimization problem. 

The code generator 

The code generator is the machine dependent portion 
of the compiler. It performs two major functions: it 
allocates data into Multics segments and it generates 
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645 machine instructions from the internal repre- 
sentation. 

Storage allocation 

A module of the code generator called the storage 
allocator scans the symbol table allocating stack 
storage for constant size automatic data, and linkage 
segment storage for internal static data. For each 
external name the storage allocator creates a link (an 
out-reference) or a definition (an entry point) in the 
linkage segment. All internal static data is initialized as 
its storage is allocated. 

Due to the dynamic linking and loading characteris- 
tics of the Multics environment, the allocation and 
initialization of external static storage is rather unusual. 
The compiler creates a special type of link which causes 
the linker module of the operating system to create and 
initialize the external data upon first- reference. There- 
fore, if two programs contain references to the same 
item of external data, the first one to reference that data 
will allocate and initialize it. 

Code generation 

The code generator scans the internal representation 
transforming it into 645 machine instructions which it 
outputs into the text segment. During this scan the 
code generator allocates storage for temporaries, and 
maintains a history of the contents of index registers to 
prevent excessive loading and storing of index values. 

Code generation consists of three distinct activities: 
address computation, operator selection and macro 
expansion. Address computation is the process of 
transforming the offset expressions of a reference node 
into a machine address or an instruction sequence which 
leads to a machine address. Operator selection is the 
translation of operators into n-qperand macros which 
reflect the properties of the 645 machine. 

A one-to-one relationship often exists between the 
macros and 645 instructions but many operations (load 
long string, etc.) have no machine counterpart. All 
macros are expanded in actual 645 code by the macro 
expander which uses a code pattern table (macro 
skeletons) to select the specific instruction sequences 
for each macro. 

Object code strategies 

The object code design 

The design of the object code is:a compromise between 
the speed obtainable by straight in-line code and the 


necessity to minimize the number of page faults caused 
by large object programs. 

The length of the object program is minimized by the 
extensive use of out-of-line code sequences. These 
out-of-line code sequences represent invariant code 
which is common to all Multics PL/1 object programs. 
Although the compiled code makes heavy use of out-of- 
line code sequences, the compiled code is not in any 
respect interpretive. The object code produce for each 
operator is very highly tailored to the specific attributes 
of that operator. 

All out-of-line sequences are contained in a single 
“operator” segment w T hich is shared by all users. The 
in-line code reaches on out-of-line sequence through 
transfer instructions, rather than through the standard 
subroutine mechanism. We believe that the time 
overhead associated with the transfers is more than 
redeemed, by the reduction in the number of page faults 
caused by shorter object programs. System performance 
is improved by insuring that the pages of the operator 
segment are always retained in storage. 

The stack 

Multics PL/1 object programs utilize a stack segment 
for the allocation of all automatic data, temporaries, 
and data associated with on-conditions. Each task 
(Multics process) has its ow y n stack which is extended 
(pushed) upon entry to block and is reverted (popped) 
upon return from a block. Prior to the execution of each 
statement it is extended to create sufficient space for 
any variable length string temporaries used in that 
statement. Constant size temporaries are allocated at 
compile time and do not cause the stack to be extended 
for each statement. 

Prologue and epilogue 

The term prologue describes the computations which 
are performed after block entry and prior to the 
execution of the first source statement. These actions 
include the establishment of the condition prefix, the 
computation of the size of variable size automatic data, 
extension of the stack to allocate automatic data, and 
the initialization of automatic data. Epilogues are not 
needed because all actions which must be undone upon 
exit from the block are accomplished by popping the 
stack. The stack is popped for each return or non-local 
go to statement. 

Accessing of data 

Multics PL/1 object code addresses all data, inelud- 
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ing members of variable sized structures and arrays 
directly through the use of in-line code. If the address 
of the data is constant, it is computed at compile time. 
If it is a mixture of constant and variable terms, the 
constant terms are combined at compile time. Descrip- 
tors are never used to address or allocate data. 

String operations 

All string operations are done by in-line code or by 
“transfer” type subroutinized code. No descriptors or 
calls are produced for string operations. The SUBSTR 
built-in function is implemented as a part of the normal 
addressing code and is therefore as efficient as a 
subscripted array reference. 



stack storage 
for A. 

on-uni t control 
data for X. 


stack storage 
for B. 


on-unit control 
data for X and 
Y. 


stack storage 
for C. 


Procedure A enabled an 
on-unit for condition X 
and cal led procedure B. 


Procedure B enabled a 
new on-unit for condition 
X and an on-unit for 
condition Y. It then 
called procedure C. 


Procedure C did not 
enable any on-units. 


String temporaries 

A string temporary or dummy is designed in such a 
way that it appears to be both a varying and non-vary- 
ing string. This means that the programmer does not 
need to be concerned with whether a string expression is 
varying or non-varying when he uses such an expression 
as an argument. 

Varying strings 

The Multics PL/1 implementation of varying strings 
uses a data format which consists of an integer followed 
by a non-varying string whose length is the declare 
maximum of the varying string. The i.iteger is used to 
hold the current size of the string in bits or characters. 
Using this data format, operations on varying strings 
are just as efficient as operations on non-varying strings. 

On-conditions 

The design of the condition machinery minimizes the 
overhead associated with enabling and reverting on- 
units and transfers most of the cost to the signal 
statement. All data associated with on-conditions, 
including the condition prefix, is allocated in the stack. 
The normal popping of the stack reverts all enabled 
on-units and restores the proper condition prefix. Stack 
storage associated with each block is threaded backward 
to the previous block. The signal statement uses this 
thread to search back through the stack looking for the 
first enabled unit for the condition being signalled. 
Figure 7 shows the organization of enabled on-units in 
the stack. 

Argument passing 

The PL/1 language permits parameters to be 


Figure 7 — Stack storage and the signal mechanism 
A signal for condition X causes the signal mechanism to search 
tack through the stack until it finds the first enabled' on-unit, 
for condition X. 

An on-unit is compiled as an internal procedure. The execution 
of an ON-statement creates a block of on-unit control data. This 
control data consists of the name of the condition for which the 
unit was enabled and a procedure variable. The signal mechanism 
uses the procedure variable to invoke the on-unit. All data 
associated with the enabled on-unit is stored in the stack storage 
of the procedure which enabled it. Normal popping of the stack 
reverts the on-units enabled during the execution of the 
procedure. 

declared with unknown array bounds or string lengths. 
In these cases, the missing size information is assumed 
to be supplied by the argument which corresponds to the 
parameter. This missing size information is not explicitly 
supplied by the programmer as is the case in Fortran, 
rather it must be supplied by the compiler as indicated 
in the following example: 

SUB : PROC(A) ; MAIN : PROC ; 


DCL A CHAR(*); DCL SUB ENTRY; 

DCL B CHAR (10); 
CALL SUB (B) ; 


Since parameter A assumes the length of the argu- 
ment B, the compiler must include the length of B in the 
argument list of the call to SUB. 
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The declaration of an entry name may or may not 
include a description of the arguments required by that 
entry. If such a description is not supplied, then the 
calling program must assume that argument descriptors 
are needed, and must include them in all calls to the 
entry. If a complete argument description is contained 
in the calling program, the compiler can determine if 
descriptors are needed for calls to the entry. 

In the previous example the entry SUB was not fully 
declared and the compiler was forced to assume that an 
argument descriptor for B was required. If the entry 
had been declared SUB ENTRY (< CHAR ( *)) the 
compiler could have known that the descriptor of B was 
actually required by the procedure SUB. Since descrip- 
tors are often created by the calling procedure but not 
used by the called procedure, it is desirable to separate 
them from the argument information which is always 
used by the called procedure. 

Communication between procedures written in PL/1 
and other languages is facilitated! if the other languages 
do not need to concern themselves with PL/1 argument 
descriptors. The Multics PL/1 implementation of the 
argument list is shown in Figqre 8. Note that the 
argument pointers point directly to the data (facilitating 
communication between languages) and that the 
descriptors are optional, also note that PL/1 pointers 


TAG: PROC; 

OCL A(1Q) BIT(N), B CHAR(7),C AREA0024); 
CALL X(A t B,C); 

END; 


Tht argument list 
prtpartd for th« 
call to X. 




pointers to the actual 
valutt of A, B and C. 



*iz* N 
low bound t 
high bound 10 
multiplier N 

«!*♦ 7l 

jjTze 1024} 


descriptor of A 


descriptor of B 


descriptor of C 


Figure 8 — An argument list showing the relationship 
between arguments and their descriptors. The 
broken lines indicate that; descriptors 
are optional. 


must be capable of bit addressing in order to implement 
unaligned strings. Since descriptors contain no address- 
ing information, they are quite often constant and can 
be prepared at compile time, 

SUMMARY 

Our experiences both as users and implementors of 
PL/1 have led us to form a number of opinions and 
insights which may be of general interest. 

1. It is feasible, but difficult, to produce efficient 
object code for the PL/1 language as it is cur- 
rently defined. Unless a considerable amount of 
work is invested in a PL/1 compiler, the object 
code it generates will generally be much worse 
than that produced by most Fortran or COBOL 
compilers. 

2. The difficulty of building a compiler for the 
current language has been seriously under- 
estimated by most implementors. Unless the 
language is markedly improved and simplified 
this problem will continue to restrict the avail- 
ability and acceptance of the language and will 
lead to the implementation of incompatible 
dialects and subsets. 7 

3. Simplification of the existing language will make 
it more suitable to users and implementors. We 
believe that the language can be simplified and 
still retain its “universal” character and 
capabilities. 

4. The experience of writing the compiler in PL/1 
convinced us that a subset of the' -language is well 
suited to system programming. This conviction 
is supported by Professor Corbato in his report on 
the use of PL/1 as an implementation language 
for the Multics system. 8 Many PL/1 concepts 
and constructs are valuable, but PL/1 structures 
and list processing seem to be the principal 
improvement over alternative languages. 9 
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A design for a fast computer for 
scientific calculations 
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The General Electric and English 
Electric Companies Limited 
Borehamwood, Hertfordshire, U. K. 


Recently developed techniques, such as the associative 
fast store and Tomasulo’s algorithm, will enable 
typical large scale computers to achieve 15 to 20 
million instructions per second. The hardware of such 
machines has a very much greater potential power, 
but it is inefficiently used, being limited to decoding a 
single instruction per logic cycle : This paper proposes 
a technique whereby the programmer is provided with 
complex instructions capable of controlling the oper- 
ation of the whole machine during one logic cycle. 
The use of such instructions for the inner loops of 
programs yields substantial performance improve- 
ments without significantly increased costs. 

Recent efforts to develop very fast computers have 
generated two elegant techniques for increasing the 
speed of computers. 

The first is the associative fast store, first used for 
the Titan computer at the University of Cambridge, 
England (a 32 word ‘slave’ store), and more recently 
for the IBM 360/85 (a 16 K byte ‘buffer’ store or 
‘cache’). The associative fast store seeks to overcome 
the major problem in the design of very fast computers, 
the disparity between the access time of suitable main 
stores and the potential operation time of the arithmet- 
ic units, by providing a small quantity of very fast 
integrated circuit store. This can be made as fast as 
the arithmetic units but it cannot contain more than 
a fraction of the information used by typical programs. 
However it has been found experimentally that, in 
any short period of time, programs do not access the 


whole of their storage and that a fast store, which 
retains a few hundred of the words most recently 
used by the program, is able to provide without delay 
almost all the information needed by the processor. 

A possible method of implementation is shown in 
Figure 1. The fast store holds a number of words of 
code and data, together with their addresses. When 
the processor requires a particular item, the address 
is first sent to the fast store where it is compared 
simultaneously with the addresses of all the words 
in the fast store. Should the required item be present 
in the fast store, then its address will match that sent 
by the processor and the data can be returned to the 
processor with minimal delay. If none of the addresses 
match, then the required item must be fetched from 
the main store and the processor may be held up. 
But when the data word has been fetched, in addition 
to being sent to the processor, it can also be inserted 
into the fast store, displacing some other item, so 
that should it be needed again it will be immediately 
available. 

The success of this technique is entirely dependent 
on the proportion of data items needed by the pro- 
cessor which have to be fetched from the main store, 
and this proportion, the failure rate, is the primary 
criterion of the effectiveness of the fast store. The 
speed of the computer is determined by:. 

Effective Access time = Fast Store Access Time 

/Main Store Failure\ 

\Access Delay Rate / 
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Figure 1 — The use of an associative; fast store to reduce 
the access time of the main store 


If the Main Store Access Delay, which must include 
organisational overheads as well • as the Main Store 
Access Time but which may be partially overlapped, 
is equivalent to ten fast store accesses then a failure 
rate of 3 percent must be attained to achieve 75 per- 
cent of the potential processor speed. 

Experimental simulations with ajctual programs have 
shown that the three characteristics of the fast store 
which most affect the failure rates! are its organisation, 
its size, and the size of the unit of information trans- 
ferred from the main store to the fast store. The 
organisation of the fast store need not concern us 
here, except to remark that the type of organisation 
described above is to be preferred to alternative 
methods which avoid the associative access to large 
numbers of addresses. 

The experimental simulations show that the primary 
method of obtaining an adequately low failure rate 
is to make the fast store large [enough. If the fast 
store is smaller than several hundred words then 
programs refer to many items not held in the fast 
store and the full performance of the machine is not 
obtained. However the fast store must not be made 
too large, even without cost considerations. As the 
size of the fast store is increased : so its access time is 
also inevitably increased, and eventually this increase 
in the physical access time of the fp,st store overwhelms 
any further reduction in the njimber of references 
to the main store. Figure 2 show& the result of simu- 
lations to obtain the effective access time of a particu- 
lar integrated circuit fast store operating with a thin 
film main store, for seven sample programs. It can 



Figure 2 — The physical assess time of a fast store (broken line) 
and the effective access times (continuous lines) for sample pro- 
grams. Access time is a percentage of main store access time, 
storage size is in words, and line size is 4 words 

be seen that for many of the programs the optimum 
size of the fast store is about 1000 words. 

The description of a fast store given above assumed 
that the unit of information held in the fast store 
was a single word, and that information is transferred 
from main store in single words. The experimental 
simulations have shown that a more efficient unit 
would be a block of a small number of consecutive 



Figure 3 — The physical access time of a fast store (broken line) 
and the effective access times (con tnuouslines) for several sample 
progiams. Access time is a percentage oi main store access time 
line size is in words, and store size is 1024 words 
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words accessed simultaneously from the main store. 
Such a block is very similar to, though much smaller 
than, a page in a paging system and will be called a 
line. Figure 3 shows how the effective access time of 
an integrated circuit fast store varied with line size 
during the simulation of seven sample programs. It 
can be seen that, when the line size is small, increasing 
it not only improves the physical access time of the 
fast store but also reduces the failure rate, resulting 
in an impressive performance improvement. But for 
larger line sizes any further improvement in the access 
time from the increased line size is offset by increasing 
failure rates and overall performance deteriorates. It 
appears that a line size of between four words and 
sixteen words is suitable, providing that the line size 
is not allowed to exceed the total width of the main 
store. 

The associative fast store technique provides very 
hast effective access times and overcomes this problem 
in the design of very fast computers. Thus the onus 
is placed back onto the processor to make full use of 
the speed of the fast store, both by the provision of 
fast arithmetic units and by the execution of lengthy 
arithmetic operations in parallel. A beautiful technique 
for overlapping arithmetic operations has been de- 
veloped by R. M. Tomasulo for the IBM 360/91 
and is known as Tomasulo’s algorithm. 

Consider^ for instance, the typical tight loop con- 
taining floating point load, multiply, add, and store 
instructions operating on the same register. As shown 
in Figure 4a the conventional machine places the 
result of each operation in the register before extracting 
it again to perform the next operation. The register 
has no substantial function in this loop which would 
be more efficiently performed as shown in Figure 4b. 
Here the partial result is passed directly from one 




arithmetic unit to the next without first being placed 
in the register, a technique known as forwarding. Not 
only is this faster, but it also frees the register from 
interlocks, which would prevent its concurrent use 
for subsequent calculations. Thus for the example 
loop, it might be possible to launch the second iteration 
of the loop before the first iteration has been completed. 

The basic structure of a floating point arithmetic 
unit using Tomasulo’s algorithm is shown, slightly 
simplified, in Figure 5. Separate arithmetic units are 
provided for addition and multiplication, and there 
are also units to hold the floating point registers and 
to buffer operands to be written to store. The arithmet- 
ic units are pipelines so that several independent 
operations, in different stages of completion, can be 
processed simultaneously within each arithmetic unit. 
Thus, for instance, the addition unit can start a further 
addition operation each logic cycle even though the 
individual addition operation takes three to four cycles 
to complete. 

In front of each arithmetic unit there is a block of 
registers in pairs, the reservation stations. These serve 
to gather the operands required for the arithmetic 
operations as and when they become available. As 
soon as a reservation station has collected both the 
required operands, the relevant arithmetic operation 
can be started at this, the earliest possible, moment. 
Operands are made available to the reservation stations 
as early as possible by the cross bar switch which 
connects the outputs of all the arithmetic units, the 
registers and the store buffers to the inputs of all the 
reservation stations, so that any operand can be routed 
directly to any reservation station where it is required. 

Tomasulo’s algorithm applies only to operations 
between registers. Consequently arithmetic operations 
that derive one of their operands from store are per- 



Figures 4a and 4b — -The use of forwarding to speed 
arithmetic calculations 


Figure 5 — Typical floating point unit for use with 
Tomasulo’s algorithm 



204 Fall Joint Computer Conference, 1969 


formed in two stages, the first of which loads the 
operand from store into one of the store buffer registers 
while the second is a register to register operation 
between that buffer register and the specified floating 
point register. 

Under Tomasulo’s algorithm instructions are still 
decoded sequentially but their execution proceeds as 
and when the required operands become available. 
Arithmetic operations between registers are per- 
formed in four stages : 

select a suitable vacant reservation station, 
obtain both operands and place them in the 
reservation station, 
execute the arithmetic operation, 
transmit the result directly to all registers and 
reservation stations waiting for it. 

The identity of the destination] register must not be 
held with the operation as it is being processed, for 
arithmetic operations can be performed out of sequence 
and the result of some subsequent operation may 
already have been placed in that register. The essence 
of Tomasulo’s algorithm is that a record is kept, for 
each register, of the origin of the result for which it is 
waiting, the result most recently assigned to it. Previous 
results, directed by the program to pass through the 
register, will be forwarded directly to the relevant 
arithmetic units and can be ignored by the register. 
The same technique is used for reservation stations, 
recording for each which operand or result it is waiting 
for. 

As an example of the required effect, consider the 
short loop referred to above. The first instruction loads 
an operand from store to a floating point register. 
Obtaining the operand from store will take a small 
interval of time, even with an integrated circuit fast 
store, and so a store buffer is allocated, the register 
is set to wait for an operand originating at this store 
buffer, and the next instruction i^ considered. 

This calls for a multiplication, and a reservation 
station in front of the multiplication arithmetic unit 
is allocated. One of the operands of the multiplication 
is being fetched from store, and one half of the reser- 
vation station waits for an operand from the store 
buffer allocated to this operation. The other operand 
is that to be loaded into the register as a result of the 
previous instruction, but this operand has not yet 
arrived at the register. Thus the other half of the 
reservation station is set to await this operand directly 
from the store buffer allocated to the first instruction, 
bypassing the register and making the operand avail- 
able to the arithmetic unit at the earliest possible 


moment. The result of the multiplication instruction 
is to be placed in the register, which is set to wait for 
it. That the register will now ignore the operand from 
store is of no significance, for that operand will be 
routed directly to the multiplication unit, the only 
place where it is required. 

The third instruction is an addition instruction, 
and an addition unit reservation station is allocated. 
Here too, one of the required operands comes from 
store and the other from a register which has not yet 
received the required result. In this case the register 
awaits a result from the multiplication unit and so 
the reservation is set to wait for an operand from a 
third store buffer and for the result of the multipli- 
cation. The register now awaits the result of the ad- 
dition. The last instruction stores the contents of the 
register, causing one of the reservation stations for 
buffering operands to be stored to wait for the result 
for which the register waits, the result of the addition 
operation. 

Meanwhile the two initial operands being fetched 
from store have probably arrived and been recognised 
by the multiplication unit reservation station. This 
has enabled the multiplication operation to start. The 
third operand fetched from store will be collected by 
the addition unit reservation station which will have 
to wait for the end of the multiplication for its other 
operand. In due course the result of the addition will 
be routed directly to the storage unit reservation 
station and also to the floating point register unless, 
as is likely, that register is already involved in sub- 
sequent operations. 

The success of this scheme is entirely dependent on 
the registers and reservation stations being able to 
recognize the operands they require. Tag fields, shown 
in Figure 5 attached to each register and reservation 
station, are used for this purpose, containing a four 
or five bit tag identifying the origin of the required 
result or operand. Every operand presented to the 
cross bar switch is accompanied by an identification 
of its origin and each reservation station compares 
this with the origin of the operand for which it waits, 
so that the required operand can be recognised and 
acquired. 

The origin will be one of the floating point registers, 
one of the store buffers or an arithmetic unit. For this 
purpose the arithmetic unit identification requires 
further elaboration since several arithmetic operations 
may be pending. Thus the origin defines the particular 
reservation station of the arithmetic unit rather than 
just the arithmetic unit. 

The tag fields of floating point registers are readily 
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set, for the register has been explicitly defined as the 
destination by an instruction which has already se- 
lected a store buffer or a reservation station. Thus the 
origin of the required operand is known. The register 
may have had its tag field set, indicating that it was 
still waiting for the result of some previous instruction. 
This tag field may now be overwritten with the new , 
identification since that result will be routed directly 
to any reservation station needing it and there is no 
further need of it in this register. 

Similarly if the operand required for a reservation 
station is to be obtained from a store buffer or is already 
held by its floating point register, then the identity 
of the origin of the operand can readily be inserted into 
the tag field of the reservation station. 

But in many cases the required operand will not 
yet have reached the floating point register. In this 
case the identity of the register cannot be placed in 
the tag field of the reservation station, for the current 
contents of the register are irrelevant and out-of- 
sequence instruction execution may cause some sub- 
sequent result to have been already placed in the 
register before the required operand becomes available. 
However the tag field associated with the register 
indicates the origin of the operand for which it waits 
and which is required by the reservation station. Thus 
if the tag field of the reservation station is set to this 
value, then the required operand can be acquired by 
the reservation station directly it is presented to the 
cross bar switch and without ever passing through the 
register at all. 

Tomasulo’s algorithm is very effective and enables 
us to discount the time taken to perform floating 
point addition and multiplication, within limits. A 
machine equipped with a fast store and Tomasulo’s 
algorithm is potentially capable of instruction rates 
approaching one per logic cycle. However the speed 
of the machine is reduced by the need to move data 
between the fast store and slower stores, conditional 
branches, interlocks on indices, and occasional very 
slow arithmetic operations such as division. In practice 
the machine can execute most programs at about one 
instruction per two logic cycles. Using current state 
of the art logic elements and high density intercon- 
nection techniques, logic cycles of 25 to 35 nanoseconds 
can be achieved, yielding an effective machine speed 
of 15 to 20 million instructions per second. 

Both the associative fast store and Tomasulo’s 
algorithm are standard well understood techniques and 
the machine described above is typical of large scale 
general purpose computers currently being developed 
for delivery in the next few years. A clear requirement 


exists for a small number of very much more powerful 
machines. These are required to perform lengthy 
repetitive “ number crunching” in scientific appli- 
cations such as weather forecasting and nuclear physics. 
There appears to be almost no limit to the useful 
speed of such machines, but the applications need a 
performance at least an order of magnitude greater 
than can be readily obtained with the techniques 
described above. 

At first sight the attainment of a major improvement 
in performance is difficult to envisage. The fast store 
is already the fastest possible storage medium; we 
have used the maximum possible amount of logic in 
the arithmetic units, obtaining speed by brute force; 
the subtlety and complication of the control logic 
already approaches the limits beyond which it is no 
longer possible to detect and correct design faults 
and component failures; while the speeds of the logic 
elements and the sheer quantities used are such that 
the most important influence on the clock rate is 
probably the finite velocity of light. 

However, detailed examination of the way in which 
the machine actually operated reveals that much of 
this potential power is squandered. The fast store, 
at least 128 bits wide and possibly 256, is capable of 
producing several operands every logic cycle; on 
average it is required to produce one operand every 
two logic cycles. Behind the fast store is the main 
store, a thin film store with a cycle time of perhaps 
200 nanoseconds. Though its access time is long, this 
store will be well multiplexed and is probably capable 
of producing operands at the same rate as the fast 
store; it is rarely used. The floating point arithmetic 
units are capable of executing perhaps three instruc- 
tions every logic cycle; they are normally required 
to execute one instruction about every three logic 
cycles. 

All this power is wasted because the control logic 
is not able to decode instructions faster than one per 
logic cycle. Attempts have been made to design control 
logic capable of decoding several separate instructions 
simultaneously, but such control units are very large 
and hideously complicated, indeed so large and com- 
plicated that the speed of the machine may actually 
be reduced. The difficulty in the design of such units 
is the nominally sequential nature, and therefore the 
possible interaction, of the instructions being decoded 
in parallel. If these instructions were designed to be 
executed in parallel and were thus known not to inter- 
act, the problems of the control logic would be greatly 
simplified. Even better would be to forego the use of 
instructions designed to meet a general purpose se- 
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quential programming concept,; and to use instead 
instructions functionally oriented around the actual 
hardware present in the machine. ; 

Figure 6 shows the hardware! that might typically 
be present in a large scale general purpose central 
processor of the near future. The machine contains 
four main functional subunits: 

an instruction prefetch unit, 
an address calculation unit, 
an integer arithmetic unit, 
a floating point arithmetic unit. 

There are two storage subsystems, a very fast inte- 
grated circuit store, with associative addressing, and 
a main store of longer access tirhe but multiplexed to 
achieve a similar data rate. The stores are assumed 
to be 128 bits wide which, in vibw of the simulations 
referred to above, is the narrowest width likely to be 
chosen for a conventional machine and is also the 
narrowest width for which these proposals are feasible. 

The Direct Functional Control scheme provides 
instructions to explicitly control; this hardware so as 
to extract maximum performance for scientific calcu- 
lations. This must be achieved ivith the minimum of 
extra hardware, less because of cost than because 
substantial extra hardware would] be likely to lengthen 
the logic cycle and reduce the speed of the machine. 
Each of these instructions, which will have to be at 
least 128 bits long, provide control over all four major 
functional units for one logic cycle and one cycle 
only. But during that one cycle! the programmer can 
control directly the arithmetic operations performed 
by each unit and the gating on their data buses. 

The nature of such instructions is most readily 
understood by reference to the; floating point unit 
which is shown in Figure 7. Comparing this with 



Figure 4, it can be seen that the hardware is sub- 
stantially the same as for a conventional large scale 
computer employing Tomasulo's algorithm. Data is 
accessed from and returned to the store over double 
word highways so as to utilise fully the available width 
of the stores, the number of floating point registers 
has been increased to eight, and the gating between 
the reservation stations and the arithmetic units has 
been changed to enable any data item to be used as 
either operand of an arithmetic operation. 

Figure 8 shows how the floating point unit is con- 
trolled by a part of the 128 bit wide instruction. This 
instruction contains fields to control the routing of 
data through the cross bar switch, the gating of oper- 
ands between the reservation stations and the .arithmet- 
ic units, and the arithmetic operations performed. 
The instruction controls all the operations of the 
machine during one logic cycle and one cycle only. Thus 
when an instruction selects the registers to be gated 
into the addition and multiplication units, units which 


instruction register 
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take several cycles to perform their function, the re- 
sults of these operations are of no interest to the 
instruction that initiated them. Instead the results 
of other addition, multiplication and store access 
operations, initiated by previous instructions, will be 
presented to the cross bar switch during this logic 
cycle. The instruction controls the switch to route 
these operands to their destination registers, and any 
result not routed to a register is lost. 

Because of the very high speed of the design and 
because of its emphasis on the processing of large 
arrays of data, it is not advantageous to place array 
operands in the fast store. The high processing speed 
reduces the interval between delays due to accessing 
slower stores, and the large arrays force other useful 
information out of the fast store without themselves 
gaining any benefit. Consequently array operands 
must be fetched directly from the main store. Because 
of the multiplexing there is no reduction in the availa- 
ble data rate, but the long access time of this store 
forces the programmer to introduce foresight into his 
program, foresight that is readily available for array 
processing, The programmer is also expected to make 
use of the 128 bit width of the store to obtain his 
operands from the store in pairs. 

By not using the fast store for operand access and by 
separating its store bus from that to the main store, 
it is possible to use this store for instruction fetchs 
and it can now produce one instruction 128 bits wide, 
every logic cycle. The address of this instruction is 
generated by the instruction prefetch unit, and may 
sequentially follow the address of the previous in- 
struction, pr may be a branch to another part of the 
program. A conventional machine would queue in- 
structions in the prefetch unit until it was ready to 
decode them one at a time. In the proposed design 
the instructions, all 128 bits of them, are immediately 
decoded and executed without (in most cases) regard 
for queuing or interlocks. 

The address generation unit forms the addresses of 
the operands needed from main store. The conventional 
machine would first search the fast store for these 
operands before accessing the main store. Now the 
fast store is fully occupied with supplying instructions 
and the operand addresses are despatched directly to 
the main store. The access time of the main store 
must be known to the programmer who must base 
his program on the generation of an operand address 
this predetermined number of instructions ahead of 
his requirement for the operand. He must also make 
allowances for the multiplexed nature of the main 
store so as to avoid clashing. 

The integer arithmetic unit is principally required 


to increment the indices used for array accesses and 
to count round loops. 

Control of the integer arithmetic unit, the address 
generation unit, and the instruction prefetch unit 
requires about 12 bits for each unit, out of the 128 
available in the instruction. A 12 bit displacement 
field must also be provided for address generation. 
Frequently no branch is taken, and in such cases the 
12 bits controlling the instruction prefetch unit are 
redundant. The addition of a single bit to indicate 
this enables these 12 bits to be used as a constant by 
the integer arithmetic unit or as an extention of the 
displacement field to 24 bits. Thus control of the 
‘integer half* of the machine can be obtained for 49 
bits. In practice 16 integer registers may be found 
unduly restricting, and increasing the number of 
registers to 32 together with the provision of an ad- 
ditional loop counting facility, independent of the 
integer arithmetic unit, would require another 11 bits 
making a total of 60. 

Control of the foating point unit consists of two 
parts: control of the individual arithmetic units and 
control of the cross bar switch. Control of the arithmet- 
ic units requires 20 bits to select the registers to be 
gated into the arithmetic units and to select the 
function for units that can perform several (for in- 
stance, addition and subtraction). The cross bar switch 
as illustrated contains 110 cross points, too large a 
number for direct control to be provided by a 128 bit 
instruction. However by restricting operations of 
limited usefulness, for instance gating two operands 
simultaneously into the same register, and by then 
adopting a simple coding system without loss of gener- 
ality, it is possible to reduce the number of bits required 
to control the cross bar to 48, 

The quantity of extra hardware required to provide 
direct functional control over the machine is quite 
small. Thus it should be possible to adapt a conven- 
tional design to operate in this mode without destroying 
its capability as a strict sequential processor using 
Tomasulo’s algorithm and a fast store. 

The principle implementation problem of the design 
is that the whole machine appears completely synchro- 
nous to the programmer. This causes difficulty when 
the machine has to be stopped, as the whole of a rather 
large quantity of logic has to be stopped at the same 
moment in time. The need to stop might occur because 
of an external interrupt, the absence of an instruction 
from the fast store or of a data item from the main 
store, interference in the main store due to peripheral 
transfers, or because of branching. 

For most branching operations there is no need to 
stop the machine during the short interval of time 
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before the new instructions becojne available. In many 
cases, for instance the end of the row of a matrix, it 
is possible to anticipate the branch and to continue 
to issue a controlled number of instructions from the 
previous instruction stream pending the arrival of the 
new instruction stream from the fast store. But events 
such as the detection of an exceptional case of an oper- 
and may necessitate stopping | the processor while 
further instructions are fetched, to avoid erroneous 
further processing or the destruction of the operand. 

The ability of the proposed design to issue a complex 
instruction every logic cycle and to make intensive 
use of the available hardware resources, results in a 
very powerful computing machine, a machine that is 
particularly attractive because its hardware cost is 
only marginally greater than for an ordinary large 
scale computer. This performance must be paid for 
chiefly in programming difficulty. Indeed if direct 
functional control was the only |mode of operation of 
the machine, it would have to jbe abandoned as un- 
programmable. However the machine can still be 
programmed with conventional! sequential orders in 
exactly the same way as any other machine, direct 
functional control being used onljy for inner loops. The 
success of the design depends op the bulk of the code 
of a program being executed comparatively infrequently 
and on the inner loops in which! the bulk of the pro- 
cessing is performed being comparatively small. It is 
then possible to make a reasonable decision as to the 
extent to which direct functional control should be 
use d in any given problem . 

The only other technique currently available which 
can yield computers of comparable power is the pro- 
cessor array concept developed by D. L. Slotnick and 


currently being implemented as ILLIAC IV. The 
direct function control scheme proposed here is sub- 
stantially cheaper up to the limits of its performance, 
can tackle a wider range of problems, and because of 
its single instruction stream, single data set structure, 
is hopefully easier to program. Programming an appli- 
cation for the ILLIAC IV requires understanding of 
the problem so that it can be reformulated to fit the 
processor array, while direct functional control re- 
quires only local rearrangement of the code. But 
direct functional control can provide only a limited 
performance from a single processor, and a processor 
array can ultimately achieve a very much greater 
processing capability. 
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INTRODUCTION 

This paper describes the results of a collaborative 
design effort aimed at development of a general purpose 
display system for the SDS-940 time-shared com- 
puter .f The important features of the system evolved 
gradually from a number of separate design goals. We 
wanted a display system that would : 

1. Contain an extensive but straightforward set 
of display generating commands. 

2. Be able to generate pictures from highly complex 
data structures. 

3. Allow easy access to display files from user 
programs in the main computer. 


* Formerly consultant to Bolt, Beranek and Newman, Inc. 

t The system to be described was implemented by Sanders 
Associates, Inc. for Shell Development Company. An identical 
system was originally planned for delivery to Bolt Beranek and 
Newman Inc., but BBN’s decision to replace their SDS-940 with 
a larger machine altered these plans. 


4. Provide some immediate feedback and inter- 
active processing service to the display user, 
and be able to call upon the main computer 
for more extensive service. 

5. Permit attachment of special purpose display 
generation and interactive hardware, as well as 
multiple display consoles. 

6. Be capable of time-sharing its central resources 
among separate console-users. 

These goals and their influence on the system design 
provide a framework for the detailed discussion that 
follows in the body of this paper. Before proceeding, 
however, we would like to give some orientation by 
presenting an overview of the system design without 
dwelling on our motives. 

Figure 1 is a block diagram of the system. As indi- 
cated, its main components are a display processor 
(including computer interface) that controls the system 
and channels digital information among the other 
components, a display generator that produces ap- 
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Figure 1 — System configuration 


M|i« In*' Bui (14) 



Figure 2 — Processor 

propriate analog drive signals, and a collection of 
display consoles and other peripheral devices Note 
that the display shares the memory of the central 
computer. 

The display generator contains high speed vector, 
character and beam positioning generators. Display 
generators are discussed in references 1, 5, and 8, and 
the characteristics of the display generator for this 
system are discussed in reference 10. This paper is 
primarily concerned with the design of the display 
processor, and with certain aspects of the overall 
system design. 

Figure 2 shows the display processor in more detail. 
One can view it as a collection of registers, each of 
which is connected to two main information paths — 
the Main Input and Main Output Busses. Other 
information paths provide connection to the display 
generator, peripheral devices, ahd computer interface. 

Figures 3a - d describe the command set for the 
display processor. All commands are “immediate” in 
the sense that each contains its operand(s) in what is 
usually the address field (referred to in this paper as 
the “operand field”). The Display Commands (Figure 


3a) supply information to the display generator via 
the X, Y, and character registers. The Address, Data 
and Miscellaneous Commands (Figures 3b - 3d) affect 
the contents of the various display processor registers, 
and may also cause information to be stored in 940 
memory. Most of these commands contain separate 
fields to specify the operation and the register to be 
operated on. Because the various registers serve dis- 
tinct functions, the effect of a given command will 
vary depending on the register specified. For example, 
a Load of the Program Counter is equivalent to a 
conventional jump; a Loading of the I/O register will 
have an entirely different effect. The I/O commands 
(Figure 3d) transfer digital data and control infor- 
mation to and from the peripheral devices, either 
directly, or via the I/O register. 

As shown in Figure 1, the Display Processor and 
Generator, taken together, control and supply infor- 
mation to the peripheral devices via the Analog and 
Digital I/O Busses. Three paths allow the 940 com- 
puter, in turn, to control and inform the Display 
Processor. These are a direct connection between the 
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Figure 3a — Display commands 
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Figure 3d — Miscellaneous and I/O commands 

Display Processor and 940 Core Memory, and two 
connections, the I/O and interrupt lines, between the 
Display and the 940 Processor. 


to carry OPCODE and X-Y or character data in a 
single word. 

Figure 3a shows the six Display Commands. The 
display generator can produce lines and characters. 
Lines are drawn in 2 + 3L microseconds, where L is 
the length of the line in inches. The beam can be 
randomly positioned anywhere on the screen in 7 
microseconds maximum. Characters are drawn in from 
4 to 12 microseconds, depending on size and number of 
strokes required. One command plots three characters 
in “typewriter” format*; the remaining commands 
specify the endpoints of displayed lines.** The end- 
point of a line can be specified, in two’s complement, 
as an absolute location on the 1024 by 1024 coordi- 
nate grid of the display screen or as a relative displace- 
ment from the current beam location. One pair of 
commands allows endpoints to be specified in relative 
or absolute terms. Another pair allows mixed specifi- 
cations — one coordinate absolute, the other relative. 
The remaining command allows three endpoints to be 
specified as short, relative displacements. Each X or 
Y component of a short displacement specification 
(Figure 3a) is represented, in two’s complement, by 
one sign bit and two magnitude bits. The two magnitude 
bits are treated by the hardware as the two high order 
bits of a three bit magnitude representation. The low 
order bit is assumed to be 0. This allows displacements 
of about 0.1 inch in X and Y to be specified. Each 
line specification carries an unblank bit (U). If set, 
the line will appear, otherwise it will produce an in- 
visible beam movement. 

The appearance of displayed elements is controlled 
by the three fields of the display parameter registe 
(RIO), (Figure 4). Eight intensity levels and four 
character sizes are available. A line can be drawn solid, 
in a variety of dotted and dashed formats, or as a single 
dot at its terminal point (point plotting). To allow 
independent control of the three parameters, a masking 
mechanism is included.f To change parameters one 
uses a Load Command (Figure 3c) with bits 12-14 
specifying which parameters are to be affected. 


Display commands 

One of our goals was to design a rich but “clean” 
series of display commands. In particular, we wanted 
to avoid a difficulty we encountered in several other 
display systems — the fact that word length restrictions 
force reliance on two word instructions or on dual 
operating modes in which the machine will treat all 
words either as display data or instructions, depending 
on its mode. The 24 bit word length of the 940 pro- 
vided enough space (just barely) to allow all instructions 


* A null code can be placed in the unused character position when 
it is desired to plot one or two characters. In addition, the charac- 
ter generator has an unusually rich complement of control char- 
acters, including space and half space up, down, backwards, and, 
forwards. Full details are covered in reference 10. 

** The strting point for a line or group of characters is the, 
current beam position. The X and Y registers always contain this 
value; their contents are appropriately updated as each Display 
Command is executed. 

f A similar scheme was used in the Digital Equipment Corpora- 
tion (DEC) 340 and 338. 
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Figure 4 — Operand field tor display parameter command 


The pushdown stack 

One of our key goals was to achieve a display system 
that would allow us to represent pictures by means of 
complex data structures. Behind this goal was a desire 
to eliminate or minimize the separation that is neces- 
sary in many systems between a “master represen- 
tation” and a “display file”. Booking at this rather 
general goal in more detail, we wanted the ability to: 

1. Execute nested picture subroutines to arbitrary 
depth. 

2. Create “transparent” subroutines — save and 
restore selected display registers such as the X 
and Y beam position and display parameters 
on entering and leaving a subroutine. 

3. Pass parameters to subroutines. 

4. Easily identify objects selected by light pen 
or stylus in terms of the picture structure. 

5. Perform certain forms of general list processing - 

Nested subroutines can be handled by a variety of 
subroutine mechanisms. The need for easy light pen 
selection led us to use a pushdown stack. When pro- 
cessing a light pen or stylus “hit’' one must trace one’s 
path back through the subroutine hierarchy in order to 
relate the object selected to the drawing structure. 
Without a pushdown stack this requires search through 
the subroutine structure. With a stack system, however, 
the required trace is maintained compactly and auto- 
matically by the return addresses stored in the stack. 

The use of a pushdown stack is not in itself new with 
this design. The DEC 338 display, for example, made 
very successful use of a stack system. What is unique 
in this display is the way in whidh the stack was imple- 
mented. The need to save and restore information 
other than return addresses meant that it had to be 
possible to push any register into the stack. In order 
to get the information back into the right register, 
data in the stack had to be marked in some way. After 
considering several marking schemes, we hit on the 
idea of placing instructions rather than data in the 
stack. When a display register; is “pushed” into the 
stack, what actually appears in memory is an in- 
struction to reload the register; in question with its 
original contents. 

The notion of putting instructions in the stack, of 


course, changes one’s conception of the whole stack 
mechanism. The POP instruction (counterpart to 
PUSH), for example, becomes a special variety of 
“execute”, and the stack pointer a kind of auxiliary 
program counter. In recognition of this, we reversed 
the direction in which stacks usually build. As infor- 
mation is pushed into the stack, the stack pointer is 
decremented. This means that instructions in the stack 
are “popped” (executed) in the usual low-to-high 
address order. 

Treating the stack pointer as an auxiliary program 
counter suggested that we make it accessible, as is 
the program counter, to certain processor instructions. 
By doing so, we freed the stack from a fixed location 
in core. Because one can load the stack pointer, one is 
free to start the stack where one pleases. Moreover, 
as we shall see below, one can even achieve a stack 
that occupies disjoint areas of memory by saving the 
old stack pointer at the beginning of each new section 
of stack. 

With this background, we can now look at some 
details of the stack system. The Push, Load/Push, 
and Push Data commands (Figures 3b and 3c) place 
information in the stack, Pop and Pop but Skip if 
Jump (Figure 3d) get it back out. As mentioned above, 
the Push commands assemble instructions in memory ; 
the Pop commands execute these instructions. The 
Push operation may seem complex, but is in fact quite 
simple. To see this, let us examine a Push command 
in detail. 

1. Assume “push the X Register” has been fetched 
into the Instruction Register (Rl). 

2. The register field (bits 4-7) of Rl selects the 
X register (R8). The contents of R8 are copied 
to bits 12-23 of Rl. 

3. Bits 8 and 9 of Rl are cleared to 0. The re- 
mainder of Rl is left unchanged. 

4. Rl is copied back into the memory at the 
location selected by the Stack Pointer (R3). 

5. The Stack Pointer is decremented. 

6. The net result in memory is a “Load the X 
Register” command* with the current X value 
in its operand field. 

The main use for Push is to save register contents 
for later restoration at the end of a subroutine. As 
indicated in Figures 3a and 3b, Push can be) brought 
to bear on any register accessible to the programmer. 
Because the stack is marked, a single instruction re- 
stores the information regardless of where it came 
from. 


* A variant of Push will place an Add Command in the stack. 
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In dealing with display structures, it is convenient 
to supply names or tags for the objects being presented. 
These may, for example, be pointers to other areas of 
memory that describe non-graphic properties of the 
objects. The No-Op command (Figure 3b) allows 
names to be included in a display file. It causes no 
action, but its operand field may contain tag infor- 
mation. Push Data allows names to be pushed into 
the stack, a further convenience when tracing back 
through a subroutine hierarchy. This command writes 
its own operand field into the stack in the form of 
a No-Op command. 

The third Push variant — Load/Push — exchanges its 
operand field with the selected register before writing 
the original register contents into the stack. Load/Push 
the Program Counter provides a standard subroutine 
call. The current program location is stored in the 
stack (as a Jump instruction) while the Program 
Counter is simultaneously reset to the subroutine 
entry point specified by the Load/Push command. 
Load/Push can be used in a similar way to save and 
simultaneously reset any other register. 

Load/Push the Stack Pointer deserves special at- 
tention. Because the Stack Pointer is loaded with the 
new value before its original contents are pushed, the 
old value will be pushed into the new stack Thus, 
the first word put into the new stack is a pointer that 
links it to the old stack. It is this feature that allows 
one to create disjoint stacks; the saved stack pointers 
provide an automatic address chain back to the original 
stack. We have chosen to call these stored links “Stack 
Jumps.” 

Pop, the counterpart to Push, causes the display 
processor to execute instructions in the stack. When 
the processor encounters a Pop, it increments the 
Stack pointer, fetches the instruction selected by the 
new pointer value, executes that instruction, and then 
returns to normal instruction execution under control 
of the Program Counter. Typically, the instructions 
executed by Pop will be Load or No-Op commands 
created by one of the Push instructions. However, 
any instruction can be executed through Pop. 

With the Pop instruction in hand, we can now ex- 
amine a typical subroutine linkage. Having entered 
the subroutine through a Load/Push Program Counter, 
one can use Push or Load/Push commands to save 
any other registers. The net result is a series of Load 
Commands in the stack with a Load Program Counter 
occupying the last (highest numbered) address. Two 
commands: Pop followed by a Jump to the previous 
instruction will restore the saved registers and provide 
a subroutine return. The processor loops on these two 
commands, reloading the saved registers, until the 


stored Load Program Counter removes it from the 
loop and returns control to the main program. 

The Pop but Skip on Jump command allows one 
to restore saved registers without returning from a 
subroutine. This command behaves exactly like Pop 
except upon encountering a Load Program Counter 
in the stack. In this event the stacked instruction is 
ignored, the Stack Pointer decremented and the 
Program Counter incremented an extra time. The net 
result is that the processor breaks out of a loop such 
as the one suggested above, just before executing the 
return Jump. 

The above discussion has suggested some conven- 
tional uses for the stack instructions. However, such 
features as the ability to manipulate the Stack Pointer 
in various ways permits the user to devise more so- 
phisticated uses for the stack mechanism. We have 
made heavy use of this flexibility in the software 
support package. One example application is the han- 
dling of rubber band lines and other simple constraints 
within the display processor. We accomplish these 
functions by performing list processing in the display 
file using the stack feature. 11 

Experience in working with the system has shown 
that the heavy use of multiple stacks could be more 
efficient if another stack pointer were available or if 
a 14 bit address length general purpose register were 
available for temporary storage of the Stack Pointer. 
The Shell system is being modified to add two such 
14 bit general registers. The ability to execute in- 
structions in the stack has given generality and power 
to the display processor at modest cost. 

Memory sharing 

A consequence of our desire to achieve close coupling 
between pictorial and other information was the need 
to allow easy access to display files from programs in 
the 940. As well as permitting advanced graphics 
applications, we felt that close access would simplify 
the general software support for the display. 

To realize this goal we attached the display pro- 
cessor directly to the core memory of the central 
computer rather than relying on a separate buffer 
memory.* The display processor addresses the 1.75 
microsecond 940 memory through its program counter 
and stack pointer. In operation, the display processor 
refreshes the display consoles by executing display 
commands stored in 940 memory and passing the data 
they contain to the display generator. 

Given this close interconnection between display 


* This connection utilizes the 940’s second memory port. 4 
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and main computer, considerable care was necessary 
to ensure a display system that could operate effectively 
without degrading or endangering the supporting time- 
shared computer system. One! potential danger — 
competition between display and central processors 
for memory access — was reduced to an acceptable 
level by use of dual access priorities on the second 
path to memory.** 4 

A second and more serious danger — inadvertent 
alteration of 940 memory by a display program — was 
eliminated by including memory mapping and pro- 
tection hardware in the display processor. This 
equipment is identical in function to equivalent hard- 
ware in the 940. 4 By means of this mapping, the 16K 
word “ virtual” memory that can be accessed by the 
display (and 940) instructions is mapped into 2K 
word physical pages that may be scattered through 
the 64K words of 940 core memory. At any one time 
only a few of these pages may be assigned to the 
display, and those pages that $re assigned may be 
made accessible for reading onl^ or for reading and 
writing. 

Registers in the mapping hardware indicate, for 
each of the eight pages that the display might address, 
whether or not a physical page is assigned, and if 
assigned its status (read only or read/ write). Only 
the 940 monitor can change the contents of the map 
registers. As shown in Figure 6, memory addresses 
transmitted by the display processor, are processed 
through the mapping hardware before accessing 940 
memory. Any attempt to address an unassigned page 
or to write into a read-only pafee stops the display 
processor and sends an interrupt signal to the 940. 

One consequence of mapping is that undebugged 
display programs are of no danger to the system or 
to other users. Mapping has the additional benefit 
of allowing users and system software designers to 
treat display programs in exactly the same way as 
940 user programs. In fact, because mapping for a 
user's 940 program can be made identical to the map- 
ping for his display file, the two ; can share the same 

**The 940 CPU accesses memory through the first path to mem- 
ory. The display accesses memory through a second path. Devices 
on the second path can request access with either higher or lower 
priority than the first path. The display processor overlaps the 
drawing of a vector or character with the fetch of the next com- 
mand. Memory accesses at this time are with low priority. 
When the display operation is completed, access is made with 
high priority, if not previously successful. Non-overlapped ac- 
cesses are made with high priority. Using the above mechanism, 
reasonable assumptions on command mix and the fact that the 
940 memory has 4 independent interleaved modules, it has been 
estimated that the 940 CPU will be blocked from immediate 
memory access less than 2 percent of tbje time. 10 
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address space and thus, be merged in any way the 
user pleases. Thus, the user can, if he wishes, create a 
common data structure that represents pictorial and 
other properties of the objects to be viewed.. In ad- 
dition, he can achieve an unprecedented richness of 
interaction between operations performed at a display 
console and the underlying processing in the main 
computer. 

Processing tasks — Display vs. 9Ifi 

The issue of how much power to include in the display 
processor is a complicated one. This issue is discussed 
more fully in an earlier paper that was inspired by the 
difficulties we encountered on his project. We chose 
to include enough computing power to handle the 
immediate response to interactive events such as 
light pen “hits” or the depression of push buttons. 
Less than this would yield sluggish interaction; tasks 
requiring more power could, we felt, be relegated to 
the 940 processor. 

With these ideas in mind, we equipped the display 
processor with a set of commands aimed specifically 
at interactive situations. As shown in Figure 3c, these 
include bit manipulating and skip commands and an 
arithmetic compare operation. The bit manipulating 
and skip instructions include Clear, Toggle (Comple- 
ment), And, Set, Skip on 0, Skip on 1, Skip on 1 and 
Clear, all handled under the mask in the operand field 
of the instruction. These commands are used to test 
or change status, control interrupt masking and so 
forth. There is also a three way arithmetic compare of 
a selected register with the operand giving a skip of 
0, 1, or 2, depending on the result. This command 
allows one to branch on the X or Y location of the 
display beam or of a coordinate input device. Taken 
together with the Add, Register Exchange and General 
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Register Commands, f and the stack mechanism, these 
interactive commands have allowed us to do such 
things as handle light buttons, produce point rasters, 
and perform the work involved in light pen tracking, 
all without intervention from the 940. Control of the 
display processor is implemented with microcoding and 
a read-only memory. The time required per microstep 
is 400 nanoseconds. Command fetch, decoding, and 
program counter update require 6 microsteps plus a 
memory read time. The number of microsteps required 
per command execution is variable, Load requires 1, 
Push 3 and Pop 9, for example. The Pop and General 
Register Commands have the longest execution time. 
The read-only memory can be easily modified or in- 
expensively replaced. This feature will be used to 
modify or add commands thought to be useful from 
the software experience. 11 

In spite of its power, the display processor must 
call on the 940 for assistance in tasks beyond its capa- 
bilities. In addition, the 940 must, of course, have 
ultimate control over the display. We satisfied both 
needs by connecting the display processor to the I/O 
and interrupt systems of the 940. Through these con- 
nections the display processor can transmit service 
requests to the 940. The 940 processor can in turn 
interrogate and set the registers of the display. To- 
gether with the shared memory mechanism, these two 
connections yield a closeness of coupling that con- 
tributes importantly to the ability of the two machines 
to share their processing resources. 

Through its I/O lines the 940 processor can directly 
access all registers of the display. Any display register 
can be brought into the 940 processor by a 940 Parallel 
Input (PIN) instruction. Conversely, the 940 processor 
can set any display register through a Parallel Output 
(POT) instruction. This feature aids the 940 in ini- 
tializing the display and in processing interrupt 
requests. If the 940 sets the display’s Instruction 
Register (through a POT instruction), the display will 
treat the information as a command, execute it, and 
then halt. Unless directly altered by a command 
executed in this way, the display’s Program Counter 
is not changed. The net result is that the 940 can, in 
effect, “execute” any display instruction. As well as 
access to the display registers, the direct I/O con- 


fThough not directed at any particular interactive function, our 
implementation of the processor design allowed us to include 
these commands at little cost. They have proven more than worth 
the price. The Add (Figures 3b, 3c) and Register Exchange 
(Figure 3d) generates a new processor instruction in which OP 2 
operates on Rn using the contents of Rq as operand. This allows 
one, for example, to add or compare two registers. 


nection allows the 940 to stop and start the display 
set the display’s memory map and the “device map” 
described in the next section. 

The interrupt system gives the display a means for 
requesting help from the 940. Some events in the 
display (irrecoverable errors, for example) can only be 
dealt with by the 940. Either the 940 or the display 
processor can cope with other situations (light pen 
hits, scope edge violations). In recognition of this, we 
grouped all interrupt as well as other control and status 
information into one register — the System Parameter 
Register (Rll), shown in detail in Table I. The bottom 
twelve bits of this register are accessible both to the 
bit manipulation commands of the display and, via 
the POT/PIN instructions, to the 940. The top seven 
bits are accessible only to the POT/PIN instructions 
because only the 940 can deal with the information 
they contain. 

TABLE I — System parameter register* 

Bit Function 

(Bits Accessible to 940 Only) 

5 (These two bits assist the 940 in interpreting 

6 (certain interrupt events. 

7 Parity Error Flag. 

8 Memory Map Violation Flag. 

9 Time-Out Flag (the display has a built-in down- 
counting clock). 

10 Halt Mask. 

11 Halt Flag. 

(Bits Accessible to 940 or Display) 

12 Unused. 

13 X Edge Overflow Flag. 

14 Y Edge Overflow Flag. 

15 Edge Overflow Mask. 

16 Synchronous Hit Flag (e.g., light pen). 

17 Synchronous Hit Mask. 

18 Asynchronous Hit Flag (e.g., pushbutton or 
keyboard). 

19 Asynchronous Hit Mask. 

20 Blink (toggles continuously at blink rate) . 

21 Blink Control. 

22 Slow Mode Control (for storage tube consoles) . 

23 Master Unblank (if 0 unconditionally blanks the 
display) . 

* Nineteen of the possible 24 bits in this register were 
implemented. 
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The lower bits in Rll handle several kinds of events, 
for each of which there is a flag bit and a mask bit. 
The flag bit is set whenever the event occurs; the 
setting of the mask bit determines whether or not an 
interrupt signal is sent to the 94p. This arrangement 
allows the programmer to cope witjh events through the 
bit oriented instructions of the display processor, or 
ignoring them in his display program, to pass them on 
as interrupt signals to the 940. In addition, a display 
program can request service from the 940 by executing 
a Halt and Interrupt instruction (Figure 3d). 

Because the 940 must assist the display processor 
in certain situations, it was necessary to allow display 
users to write real-time 940 programs. The problem 
of preventing real-time program,^ from degrading the 
time sharing performance of the; 940 was handled by 
setting limits on a display user's CPU usage during 
each refresh cycle of the display. 

Consoles and other I/O devices 

So far, we have considered the display processor 
and its relationship to the parent computer. We were 
also concerned with display consoles and other pe- 
ripheral devices, and their relationship, in turn, to the 
display processor and generator. Our main goal in this 
area was flexibility. We wanted ;the ability to attach 
a variety of display consoles, differing in some cases 
in their equipment complements,; as well as other non- 
display devices including graphic input tablets, and 
specialized analog equipment, siich as circle or raster 
generators. We met this need by dissociating from 
the display processor design toy consideration of 
individual consoles or other devices. Instead, we elected 
to treat these as I/O devices, and to handle their 
control and the transmission of information to and 
from them by means of a very general I/O bus system. 

The digital portion of this bus system is similar in 
nature to the bussing schemes ufeed on several general 
purpose computers. Devices ar£ selected by an ad- 
dress field in the I/O instructions; all devices are 
treated homogeneously as collections of registers; and 
a given register may contain control or status in- 
formation, input or output data, or a mixture of these. 

Figure 3d shows the Input/Output commands. Two 
of these permit the user to trahsm.it information be- 
tween the I/O register (R15) and the registers of 
external devices. Incoming data and status infor- 
mation can then be examined by the Display Processor, 
through the test and skip instructions described in the 
last section, or dealt with by the 940 through the 
POT/PIN commands. The remaining two commands 
permit somewhat faster direct output of key commands 


and direct testing of key device status bits. As men- 
tioned in the last section, another component in the 
digital I/O bus system is the channeling, through OR 
gates, of synchronous and asynchronous events in the 
peripheral devices into the HI and H2 bits of the Sys- 
tem Parameter Register. 

Corresponding to this treatment of digital infor- 
mation, the transmission of analog signals within the 
system was also handled through a bussing scheme, 
which allows input of analog signals to summing points 
within the display generator as well as output of dis- 
play drive signals.* Because of this treatment of pe- 
ripheral devices, one can view the display processor 
and generator taken together as a specialized hybrid 
computer whose main job is to handle a series of I/O 
devices through a combined analog/ digital bus system. 

Just as the 940 processor is time-shared, we wanted 
the ability to time-share the display processor and 
generator among a number of user consoles without 
danger of interference between them. This was 
achieved by giving the 940 processor the ability to 
control and thus schedule, usage of the display pro- 
cessor, and by allowing for device protection hardware 
in the display's I/O bus design. This hardware utilizes 
a mapping scheme similar to the memory mapping 
and protection hardware in the 940 and has the ad- 
ditional advantage of allowing a user to refer to pe- 
ripheral devices through “virtual" addresses that can 
remain constant even though he may be assigned a 
different console at different times. 


CONCLUSION 

The stack mechanism in this design is the most sig- 
nificant departure from previous machine design 
practice. The features of a marked stack, and the 
ability to create disjoint stacks (through the “stack- 
jump" linkage) are both easy to implement and useful. 
As is by now well known, the stack feature in a display 
processor is essential for orderly treatment of “hits" 
detected by the light pen or other stylus devices. 

Close coupling between display information and 940 
programs has been achieved by the mechanism of 
shared memory. Other general purpose display systems 
seem to be relying more and more on small local com- 
puters for interactive service and to shield the main 
computer from the display. By contrast, we deliberately 
set out to achieve a rich interaction between display 
and parent computer, and the extremely close coupling 


* Whether a device generates or responds to analog signals de- 
pends upon bit settings in its control register, 




A Display Processor Design 217 


of the two machines reflects this goal. Our experience 
so far indicates that this coupling can be achieved 
without serious degradation of the 940 time-sharing 
system. 

Until now most displays have been treated strictly 
as I/O equipment. As displays have grown in com- 
plexity over the years, however, we have come to 
recognize that display processors have many of the 
attributes of general purpose computers. In recog- 
nition of this, we deliberately approached the design 
problem with a processor-oriented rather than I/O 
device-oriented approach. This thinking is reflected 
in the display’s extensive instruction set, in the use 
of memory and device mapping, in the uniform treat- 
ment of consoles as peripheral devices, and finally, in 
the microcoding and uniform bussing scheme that 
dominate the display processor design. 
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The System logic and usage recorder 


by R. W. MURPHY 

International Business Machines Corporation 
Poughkeepsie, New York 


INTRODUCTION 

A fundamental problem in monitoring the performance 
of a system with a hardware device is too much data. 
Inside the System/360 Model 40, for example, seven- 
teen address bits and sixteen data bits may be processed 
every 2.5 microseconds; this rate is equivalent in 
bulk to about three novels per second but not generally 
equivalent in interest or information. The design ob- 
jective for any hardware monitor, therefore, is to 
reduce the data it sees as soon as possible. 

The associative memory (AM) is an excellent means 
for not recording data beyond significance. The memory 
can be instructed to record data only if they are new; 
if the data have already been seen and stored, no more 
space need be squandered upon them. This philosophy 
of monitoring and measurement has been expanded 
into the System Logic and Usage Recorder, an experi- 
mental device under test in IBM Poughkeepsie's 
SDD Advanced Technology group. 

In the Recorder, the basic associative processes of 
interrogation and storage are extended, by means of a 
system of data routing and field control, into a capa- 
bility for performing advanced data reduction and 
data processing algorithms. The algorithms are pro- 
grammed and retained in a control storage where they 
may be added to or modified by the user. 

Data to be analyzed in the Recorder are collected 
at the host computer through a special monitor inter- 
face which detects and transmits such signals as in- 
struction and data addresses, operation codes, and the 
statuses of channels and internal computer conditions. 
The monitor interface, which consists of 48 lines, is 
one-way, and does not affect the operation of the host 


computer. In addition to the monitor interface, there 
is a standard input/output interface which is used to 
pre-load the associative memory when this is required 
by an algorithm, and over which the collected and 
reduced data are transmitted as the Recorder's output. 

In this paper, some simple data-gathering procedures 
are discussed first in order to introduce the design 
concepts of the Recorder. This is followed by de- 
scriptions of the organization and programming of the 
system, and finally some specific data reduction algo- 
rithms are given. 

Simple data gathering and basic operation 

A question asked in performance measurement is, 
“How much time is spent in executing programs out of 
various areas of storage?" To determine these times, a 
counter must be assigned to each of the active areas; 
when an instruction is fetched from an area, clock 
pulses begin incrementing the corresponding counter, 
and continue until an instruction is brought from some 
different area. 

In the Recorder, the counters are assigned to storage 
areas automatically, through associative memory. Ini- 
tially the memory is blank and the counters stand at 
zero; but when the first instruction address is received 
in the Recorder from the computer being monitored, it 
is stored in an associative memory word cell as shown 
in Figure 1. 

This word cell then becomes responsible for monitor- 
ing the storage area 00100 through 001FF, which the 
word cell does by comparing its contents with each 
new instruction address brought into the AM input 
register. As long as there is equality in the high-order 
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figure 1 — Assignment of counter to initial execution 
area 
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Figure 3 — Correlation of executed area with 
channel activity 


bits of the address (the low-order bits are ignored by 
means of a mask), a match will be indicated, and the 
match indicator for that cell will continue the selection 
of the corresponding counter, allowing it to accumulate 
time intervals. 

This process of interrogation is repeated until an 
inequality between the value stored in the cell and an 
instruction address produced a mismatch, signalling 
that program execution has moved to a different area 
of monitored storage. The mismatch will deselect the 
counter, and will cause the controlling program to 
branch into a write cycle in order to record a new 
active area as shown in Figure 2. 

The process diagrammed in the figure will assign 
counters as they are needed, and record their assign- 
ment in the associative word cells. Since interrogation 
of the associative memory is a single operation, it does 
not matter how many of the cells contain meaningful 
data, and the fineness of the measurements can be 



Figure 2— Assignment of next counter to next 
execution area 


adjusted by means of masking to take advantage of 
the available memory space. If execution in the host 
computer should revert to an area already identified 
by the Recorder, such as 001 in the example, the 
original cell's contents will again match the address 
and reactivate the counter for additional accumulations. 

The two-branched monitoring procedure is a basic 
one, and can be made to yield many kinds of infor- 
mation. For example, if channel activity is also moni- 
tored and presented at the interface as a field of bits, 
this field can be juxtaposed with the instruction ad- 
dress field as in Figure 3. 

With this process, which has the same flow chart as 
in Figure 2, a correlation will be made automatically 
between storage usage and channel activity. It is, of 
course, immaterial what kind of data is being; brought 
to the interface; the user can perform the correlation 
on any combinations of events which are represented 
by digital signals brought over the monitor interface. 

Another form of correlation is of interest because it 
yields information about the sequence of events taking 
place in the monitored system. This procedure consists 
of relating each event to its predecessor by forming; an 
ordered pair at the AM input register as in Figure 4. 

Two kinds of events are recorded in this process: 
the occupancy of a particular area, and the transition 
from one area to another. The procedure is essentially 
the same as that given by the flow chart of Figure 2, 
except that an additional data routing is programmed. 
Each address is first placed into the left-hand field 
(the current field) and the interrogation is performed. 
Following the action consequent on the interrogation, 
the address is then put into the right-hand field (the 
previous field) and is retained there until the next ad- 
dress arrives and the cycle is repeated. 

This procedure develops a graph of the system's 
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Figure 4 — Recording occupancy and transitions 
of execution areas 


operation in associative memory, and could be used to 
study the operation of paging algorithms. If the full 
instruction address were applied to the memory by 
modifying the mask, all the linkages of a program 
would be recorded and could be used to draw the 
program's block diagram as it was actually executed. 
The application would be very wasteful of space, how- 
ever, and impractical except for very small programs. 
There is a more complex procedure, to be discussed 
later, which eliminates much of the redundant infor- 
mation and makes block diagramming feasible with 
associative memories that will be available in the near 
future. 

Emphasis so far has been placed upon the associative 
operations and what might be called the logic recording 
capability. The usage recording functions take place 
in the counters, which are actually cells in a supple- 
mentary storage addressed by the associative memory 
as a result of interrogation operations. These cells may 
be set up in various ways to record counts, times, or 
the presence of computer conditions, according to the 
measurements required. 

General design concepts 

The examples of data gathering just discussed show 
that a variety of performance measurements can be 
made, simply by changing the nature and the position- 
ing of data applied to the associative memory. This 
variety is enhanced greatly by means of a stored 
program control system which gives the user full con- 
trol over the functions available in the Recorder. In 
general, each step of a data reduction procedure will 
specify the following elements : 

Routing . The source, length, and terminus of a 

field of data to be processed. 


Masking . The suppression of part or all of a field 
at a particular step in the procedure. 

Operation . Interrogate, store, or read for associ- 
ative memory. 

Branching . Choice of the next step, based upon 
results of previous steps. 

The specification of these elements applies pri- 
marily to associative memory as it processes the data 
received from the monitor interface, and is incorporated 
in the AM format instruction: 


Oper. 

Routing 1 

Routing 2 

Mask 

Next 

Next 

Code 




Instr. 

Instr. 





1 

2 


The operation code for the AM format instruction 
will specify one of the following: 

INTERROGATE — compare contents of input 
register with all stored words and turn on 
match indicators for cells with equal contents. 

INTERROGATE NEXT— same as above, except 
that the match indicator for the next cell is 
turned on. 

WRITE— store the contents of the input register 
into all cells whose match indicators are on. 

WRITE NEW— store the contents of the input 
register in the first vacant word cell. 

WRITE ONE— store the contents of the input 
register in the first cell whose match indi- 
cator is on. 

WRITE ALL— store the contents of the input 
register in all cells regardless of the match 
indicators. 

READ— put the contents of the first cell whose 
match indicator is on into the output register. 

Two fields of data may be moved simultaneously by 
means of the two routing specifications. These fields 
may be one, two, or three bytes in length, or, alterna- 
tively, a literal constant of one byte may be substituted 
for one of the routing specifications. The routing of 
data will be discussed in more detail in the section on 
Data Paths and Routing Control in conjunction with 
the data paths of the Recorder. 

In general, the fields of data processed are of variable 
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length, on a byte basis. The associative memory is 
eight bytes in width, and its masking is also generally 
controlled on a byte basis. However, many algorithms 
require status bits which must be masked or unmasked 
by bit. The mask specification in the instruction, there- 
fore, consists of fifteen bits, of which the first seven 
apply to the first seven bytes of the associative memory, 
and the remaining eight to the individual bits of the 
eighth byte. In addition, it is also possible to apply a 
literal mask to any byte by placing it in a routing 
specification along with an identifying code. This literal 
mask has precedence over the normal mask, and re- 
mains until removed by another literal. This mask is 
not normally used in data reduction, but is necessary 
for such algorithms as simultaneous addition into 
associative memory or ordered retrieval from it. 

The next two instruction specifications of each in- 
struction provide conditional branching to the program, 
based upon the collective condition of the match indi- 
cators. The choice of the next instruction depends on 
the following: 

INTERROGATE — if single or multiple 

match Instr. 1 

if no match Instr. 2 

WRITE or READ — if one or more 

MTs are on Instr. 1 

if no MTs are on Instr. 2 

Data paths and routing control 



Figure 5 — Recorder data paths 

control system. The word logic circuits also provide 
other functions, including tie-breaking in the case of 
multiple matches and a match/no-match signal for 
conditional branching in the program. 

Control over the data routing is accomplished within 
the instruction by means of routing specifications. The 
standard instruction format contains two routing 
specifications, each controlling one field of data; a 
special instruction format is used for supplementary 
storage operations which are to be overlapped with 
the associative operations. The routing specification in 
the standard format contains 16 bits, identified as 
follows: 


Figure 5 is a schematic diagram of the data registers 
and paths of the system. Each line represents a path 
for one byte of data, and a dot where two lines cross 
indicates a programmable connection. One group of 
six paths (48 bits) carries monitored data from the 
interface with the host computer to the input of the 
associative memory. The various registers and the 
crossbar switch provide buffering and field control over 
these data. Another path, one byte wide, connects 
memory outputs to memory inputs through an adder 
to allow internal processing functions. 

The word logic circuits link the supplementary 
storage with the associative memory and provide an 
addressing function for the two memories. This ad- 
dressing function is initiated by interrogating the 
associative memory with data in its input register; if 
the data in any associative word cell compare equally 
with the interrogating data, either that word cell, or 
a word cell in supplementary storage in one-to-one 
correspondence with it, or both may be selected for 
the entry or recovery of data. Explicit addresses for 
these word cells do not appear in the instructional 


Change Code (one bit). A zero indicates that the 
A Register is to be left unchanged; a one 
causes the specified field to be entered into 
the A Register before being routed further. 

Literal Code (one bit). A one causes a one byte 
constant from the instruction to be entered 
into the A Register before being routed 
further. This constant replaces the field length 
and source address specification. 

Length Field (three bits). Specifies the number of 
bytes of the field being routed. The maximum 
field length from the monitor register is three 
bytes, and from other sources, seven. A 
length of zero causes no transfer of data. 

Source Address (six bits). Specifies the location at 
which the lowest-order byte of the field to 
be routed is to be found. Successive bytes 
the same field are moved in accordance with 
the length specification. 

Terminus Address (five bits). Specifies the location 
to which the lowest-order byte of the field 
is to be routed. Addresses are tabulated below. 
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Supp. Store 


Supp. Store 


Output 

00 -OF 

Input 

00-0F 

Assoc. Mem. 


Assoc. Mem. 


Output 

10-17 

Input 

10-17 

Void 

1A 

Void 

1A 

I/O Input to 


I/O Output from 


Recorder 

IB 

Recorder 

IB 

Clock 

1C-1F 



Monitor Interface 

20-25 



Constant 

26 




Notes : Addresses are given in hexadecimal. 

The address for the constant is not 
used when the constant is specified as a literal, 
but if the value of the constant is unchanged 
the constant may be routed either alone as a 
one-byte field, or as part of a two- or three-byte 
field at addresses 25 or 24. 

If a void is specified as a source, the 
corresponding terminus is reset to zeros. 

If a void is specified as a terminus, 
positions of the A Register corresponding to 
the source are reset to their new values. 

The two routing specifications per instruction permit 
two fields to be moved simultaneously and in parallel 
from the monitor interface to the associative memory 
input register via the A register and the crossbar 
switch. Transfers of data from sources other than the 
monitor register take place over a bus which is one 
byte wide, and are therefore serial by byte. As a result, 
only one such transfer can be called for in each in- 
struction, using the first routing specification. The 
second routing specification can be used, however, for 
a simultaneous transfer through the crossbar. A literal 
can be specified only with the second specification. 

Supplementary storage 

Supplementary storage (SS) is used to retain times, 
counts, and condition codes for which associative pro- 
cessing is not required. However, each word cell of 
supplementary storage corresponds to a unique cell of 
associative memory and may be selected wheneevr an 
interrogation of associative memory turns on the match 
indicator for the corresponding AM cell. The general 
concept is that the AM cell retains data describing the 
state of the monitored machine, while the SS cell 
collects the statistics relative to that state. 

The character of the monitoring algorithms is that 
there is a series of operations involving associative 
memory only, establishing or identifying a record for 


the monitored machine’s state. This process will usually 
be completed only when the computer has assumed a 
new state, but a match indicator will be on, pointing 
to the record of the previous state. If the algorithm 
provides an SS instruction at this time, the SS cell 
will be selected and updated according to the SS in- 
struction. Once the selection has been made, it is not 
affected by any alteration of the match indicators 
until the SS instruction is completed and another one 
issued. 

It may be seen from Figure 5 that the updating is 
accomplished through the adder and the SS input and 
output registers, and that it is possible for AM and 
SS operations to proceed independently once the se- 
lection of an SS cell has been made. This overlap will 
take place automatically for all AM instructions except 
those which call for the transfer of data between associ- 
ative memory and supplementary storage or over the 
I/O channel. The overlapped processing may be 
represented as follows: 


New Monitored Data 


AM Proc, 


SS Proc. 


J J ! T3r— Recog. of New State 

I "I r I I I I I I 

state i ^ state i+1 

1 l 1 


state i - 1 


The time at which the monitored computer assumes 
a new state is taken to be the time of receipt of new 
monitored data, as indicated by the appropriate strobe 
signal from the computer. Since there is generally a 
lag of one cycle before the new state is recognized, the 
clock is buffered so that it may be reset to record a new 
time period starting from the strobe while the old time 
period is retained pending use in the SS instruction. 
If no new state has occurred, the old and new time 
periods are combined. 

The updating of a word in supplementary storage 
is controlled by a single instruction containing specifi- 
cations for performing different operations on four fields 
of the word. These fields may be from one to seven 
bytes in length individually, the combined length not 
exceeding the sixteen bytes of the SS word. The SS 
instruction occupies control storage as part of the 
programmed algorithm, but it differs in format from 
the AM instruction : 


Start 




Oper. 

Code 


Field 1 
LF OF 


[Next 

llnstr. 


Note 9 : RC = reset controls 
LF = length of field 
OF = operation on field 
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The starting location specifies the low-order byte of 
field 1, which is updated according to its length and 
operation specification. The remaining fields are con- 
tiguous in the SS word, and are processed in succession. 
If the entire sixteen bytes of the? word are not utilized 
in an application, the starting location may be other 
than zero, and the time of completion of the SS in- 
struction will be lessened. 

In addition to length, the field! specification may call 
for one of the following operations : 

1. Increment field 

2. Add clock to field 

3. Put the lesser of the clock reading and the old 
field value in field . 

4. Put the greater of the clock reading and the 
old field value in field 

5 . OR the interface byte to the field 

6. No operation 

Application examples 

In the application examples to follow, the algorithms 
are given as block diagrams, in which each block repre- 
sents one instruction, including data routing, the 
operation, and the masking for AM operations. Data 
are routed by fields, which are constant within each 
application and are designated by capital letters 
generally mnemonic with their meaning. The location 
of a field is indicated by a subscript identifying the 
register involved in the routing or the memory itself. 
These subscripts are : 

b — monitor 'nterface buffer 

a — crossbar entry register 

i — associative memory input register 

s — storage cells of associative memory 

o — output register from associative memory 

p — input/output registers of supplementary storage 

The various fields used in an algorithm form an 
ordered set at the input to associative memory and 
after being written into a particular word cell. The 
notation for such an ordered set is!: 

< S s P s C a > for a particular sfored word 

If interrogation is to be performed, it is generally 
on a set of such words. This set is not ordered and is 
written as follows : 

{<S.P— >} 

In this example, S and P identify the fields active 
in the interrogation, and the dash indicates that the 
field occupying that relative location in the word is 


masked. 

Application 1 : Combinations of events and states 

Problem 

To find out what system states occur over a period of 
operation of a host system, how many times each 
state occurs, and how much time is spent in each state. 
For this application, a system state is defined to be 
one combination within the following classes of moni- 
tored signals: 


Stopped/ operating 

2,possibilities 

1 bit 

Runn’ng/ waiting 

2 

yj 

1 bit 

Supervisor/ problem 

2 


1 bit 

Channels busy 

8 


3 bits 

Page of instruction 

256 

t j 

8 bits 


The monitor interface is set up to provide all of the 
above signals except page of instruction on an on-off 
basis. The page of instruction is the high-order 8 -bit 
group of the instruction address, whose presence at 
the interface is signaled by means of the instruction 
strobe. An evaluation of the system state is to take 
place at each instruction strobe, or, if instructions are 
not being executed, at each change in the remaining 
conditions. 

Procedure 

Each system state is represented by a particular bit 
pattern in the above array of 14 bits, and is recorded 
in one word of associative memory. The time interval 
and usage of each state is totaled in the corresponding 
word of supplementary storage. If instructions are being 
executed (operating and running program states), the 
entire bit pattern is used, otherwise only program and 
channel statuses are stored. 

Whenever a change of state occurs, the appropriate 
bit pattern is compared simultaneously against all those 
previously stored. If no match if found, indicating a 
new state, the bit pattern is stored in the next vacant 
word, and the statistical fields in supplementary storage 
are initialized. If a match is found, indicating a repe- 
tition, the statistics are updated. 

Interrogations of associative memory may occur as 
a result of instruction strobes without a change from 
the state of the previous interrogation. To detect 
changes, a control bit is added to the array of 14 bits 
and is set to one in the word representing the current 
state of the system. 
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Results 

At the end of the evaluation, there will be one word 
of data for each different system state which has 
actually occurred. These can then be printed out using 
the ordered retrieval procedure to present the non- 
executing states first, then the states in page order. 


only the last state indicator is stored in preparation 
for the next cycle; otherwise, the entire contents of 
the I/E register are written into the next vacant 
word to record the new state. 

Application 2 {distributions of events 

Problem 


Algorithm for combination of events and states 



S - Field combining program status and 
busy channels bits (6 bits total) 

P - Page of instruction (8 bits) 

C - Last state indicator 

Fields in Storage: P C \ 

6 s s bS assoc. 

<time +usage> supp] 


When an instruction strobe initiates a cycle, the 
monitored bits are routed through the A register to 
the I/E register for interrogation of associative memo- 
ry. A match indicates no change of system state and 
completes the cycle. 

If instructions are not being executed, the change 
of program or channel status starts the cycle, in which 
only S bits are taken from the Monitor buffer and 
zeros are put into the P field of the I/E register. 

Before the new state is recorded, the time in the 
last state must be added to the total for that state 
and the usage incremented. This is accomplished by 
interrogating with the last state indicator in order to 
select the corresponding word in supplementary storage. 
Routing of this data through the adder is not detailed. 

After resetting the last state indicator, an interro- 
gation is made with the P and S fields still in the I/E 
register to determine if the current system state is 
one which has been previously recorded. If it has, 


The path length between branches taken may be 
defined as the number of sequential instructions executed 
before a branch to a nonsequential address. It is im- 
portant in determining how far a computer ought to 
look ahead in its instruction fetches. This application 
determines what path lengths actually occur in 
programs and how frequently each occurs. The distri- 
butions are to be found for paths preceding each type 
of branch instruction. 

The signals which the monitor interface supplies 
are the operation code, a bit indicating whether the 
branch was taken, and an instruction strobe. If the 
bit for 'branch taken' were not available, then the 
address and length of instruction could be used to 
make an arithmetic check for nonsequential instruc- 
tions. 

Procedure 

The associative memory is preloaded with the set of 
operation codes of the branch instructions, one word 
for each code. These words also contain a one in a 
single-bit field to indicate "branch taken/' and a path 
length field containing zero. In addition, a specially 
marked word is set aside for the running count which 
is initially zero. 

As each operation code and "branch taken" bit is 
brought in at the monitor interface, the combination of 
these two fields is compared against the preloaded set 
of codes. If no match is found, the running count 
field is incremented by one. If a match is found, the 
running count field is routed to the path length field, 
unmasked, and a second interrogation made. If this 
also results in a match, the frequency field of the 
matched word is incremented; otherwise, the new path 
length is stored in a vacant word with an initial fre- 
quency of one. The running count field is reset to 
zero. 

Results 

At the end of the run, there will be one word stored 
for each path length and operation code. These might 
then be printed out using the range retrieval pro- 
cedure to condense the different path lengths into 
groups. 
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Algorithm for finding distributions of events 



O - Operation code and "branch taken" bit 
(preloaded set of branch codes with 
L = o) 

L - Length of path 
C - Code for running count word 
R - Running count 
F - Frequency 

Fields iri Storage - {{O L C > 

sss' asaoc. 

! < R s F s> supp} 


Each instruction strobe initiates a test to find if a 
branch was taken for one of the prespecified operation 
codes. These need not be the; entire set of the host 
computer. 

If no actual branch is found, the running count is 
incremented by selecting the word where it is stored 
with an interrogation for its code. The field is read 
out of supplementary storage, routed through the 
incrementer, and restored in the same word. 

If the branch has taken place, the running count is 
routed into the I/E register where it becomes the path 
length. The combination of operation code and path 
length then is either stored, dr if already in storage, 
causes an increment to be made to its frequency field. 

Application 3: Short sequences and mixes 

Problem 

Knowledge of instruction mixes can be an important 
factor in the planning of new systems. There are a 
number of ways in which the collection of mix data 
can be specified, all involving; some form of sequence 
following or finding. In this example, the problem is to 
find what operation codes injimediately precede the 
conditional branch types of instruction, up to a maxi- 
mum of six including the branch. 

Procedure 

i 

One word of associative memory is to be used fos 
each mix, with the operation codes distributed acrosr 
the word in six fields of one byte each. As operation 


codes arrive at the monitor interface they are routed 
to successive fields in the Interrogate/Entry register 
and also to a field set aside for comparison against the 
set of conditional branch codes which occupy a special 
set of preloaded words. When one of these codes is 
found, the array of six fields in the I/E register is 
used to interrogate the rest of associative; memory 
which holds the arrays already found, and the appro- 
priate entry or updating of usage is performed. The 
I/E register is reset to zeros, and the next operation 
code starts a new sequence. 

The sequence may go beyond five codes before a 
conditional branch is found. In that case, the seventh 
code takes the place of the first, and so on until a 
conditional branch is found. 

Results 

Each word contains one mix of six or fewer oper- 
ation codes. The terminating conditional branch code 
may occupy any of the six fields, but if there is at 
least one zero after it, the entire sequence is as re- 
corded; if not, the preceding five codes are read in 
“end-around” fashion. 


Algorithm for finding short sequences; 



Successive operation codes are placed in successive 
O fields across the I/E register by means of a string 
of macroinstructions differing only in the routing 
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microinstruction. When a branch operation code is 
received, a common routine is followed to add the new 
mix to storage or increment the usage field of an existing 
mix. 


complete test record in a format permitting an item 
by item comparison with results of tests of variations 
of load or system. 


Application 4* Long sequences 

Problem 

One way of determining the performance of a system 
is to see how often prespecified sequences of events 
occur. In this example an operating system is to be 
tested with a known load to determine if predicted 
sequences of supervisor calls, interrupts, and object 
programs are being followed. The sequences may be 
very long, may overlap or include each other, and 
may start or end with any arbitrary element. 

The change to a new current PSW represents a step 
in the sequence, and can be detected by the fact that 
there is an interruption in the host system or that a 
LOAD PSW instruction is executed. The address of 
the PSW identifies the sequence element and is ob- 
tained from the monitor interface whenever a change 
occurs. 

Procedure 

Associative memory is preloaded with the sequences 
to be followed, the elements of each sequence being 
placed in successive memory words. In the word the 
code for each element occupies one field, in this case 
24 bits of address. The word also contains two single- 
bit fields, one of which contains a one for the start 
and the other a one to indicate the end element. 

This procedure makes use of a special interrogation 
operation for associative memory in which, when a 
word is matched, the next succeeding word in physical 
order is selected for the entry of data. In this case, a 
status bit is entered after this form of interrogation 
in order to keep track of progress through the sequence, 
and the crucial interrogation is made simultaneously 
on the address and status bit. If the interrogation is 
successful after the next element has been received, 
the status bit is moved to the next word. 

In addition to recording successes in traversing 
complete sequences, statistics can be compiled on 
partial traverses in the words of supplementary storage 
corresponding to intermediate sequence elements. 

Results 

At the end of the test, associative memory will 
contain the sequences tested for, and supplementary 
storage the record of how well these sequences were 
followed. The sequences could then be printed as a 


Algorithm for following long sequences 



I IS IN Sj 
ENTER 
{< S s >) 


C - Code for sequence element 
B, E - Start and end 
S - Status bit 


Initially, and at the end of each cycle, the status 
bits are set to one for all first elements. 

When the next code is received from the monitor 
interface, an interrogation is first made to find out if 
that code matches any expected last elements of se- 
quences so far successfully followed. If so, the sta- 
tistics are updated and that element is reset to zero 
status (without affecting other elements in that se- 
quence). 

The same code then is used to interrogate the set 
of all elements whose status bit is one. This operation 
uses the INTER NEXT operation to prepare for the 
eventual entry of a one in the status bit of the next 
word. Figure 6 shows the match indicators turned on 
for the word actually matched. 

Zeros are then set into all status bits, regardless of 
the match indicators, and without resetting them. This 
step clears any elements which may not have been 
matched with this last code. 

Finally, all first elements are selected for entry by 
the use of a normal interrogate operation. This se- 
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Figure 6— Steps in following the sequence 
GHAAAABCDE 


lection is OR’d with the selection obtained by the 
INTER NEXT operation above, so that ones can now 
be entered into the union of the two sets. 

Application 5 : Block diagramming 

Problem 

In debugging or in evaluating the performance of a 
program it is important to know whether program 
segments are executed in the proper order, how much 
time is spent in each segment, how well they were 
overlapped with channel activity, and if execution 
was forced to wait. Although one or a few segments 
might be singled out for examination by methods 
similar to those of the preceding! applications, there is 
difficulty in predicting where and what to look for, 
and a chance of missing something significant. 

If every instruction address were paired with its 
successor in the instruction stream and the combination 
applied to associative memory, eventually the memory 
would contain all the links between instructions for 
that program. However, most instructions have unique 
successors, and the technique would waste memory 
space or redundant information.; The essential infor- 
mation is contained in just those linkages from or to 
instructions which have several successors or prede- 


cessors. These linkages can be identified from addresses 
and operation codes in the instruction stream. 

Procedure 

Each word of associative memory contains three 
address fields, the “entry,” “exit,” and “destination.” 
The entry and exit addresses are the first and last of a 
block of sequential instructions, and the destination 
is the entry of a succeeding block, so that each stored 
word represents one linkage in the logical structure of 
the program. 

Certain addresses are identified as exits when they 
occur in the instruction stream accompanied by a 
branch operation code. The first address after an exit is 
automatically an entry to a current block, which will 
occupy one of four possible relationships to blocks 
already found. As the entry and succeeding addresses 
appear in the instruction stream, they are compared 
with previously stored entries and exits to resolve 
whether the current block is new or one being retraced, 
or whether either the current block or an old block is 
to be partitioned. 

As execution of the program proceeds, with repe- 
titions of its segments, most of the linkages will be 
followed one or more times, and the corresponding 
division of the address stream into blocks will be 
established. When these elements are found or repeated, 
their time and usage is noted, and channel and wait 
statuses are correlated with them, using supplementary 
storage for this additional data. 

Results 

It can be shown that each conditional branch in- 
struction will result in at least two, and no more than 
four linkages, and that the number of blocks established 
by the branch is always one less than the number of 
linkages. Since one word of storage is required for each 
linkage, approximately 2700 blocks can be recorded in 
a 4096-word memory. Depending upon the complexity 
of the program’s structure, the memory can cope with 
programs of between 6,000 and 16,000 instructions. 

At the conclusion of a block diagramming evaluation, 
associative memory will contain the structural com- 
position of the program according to its actual exe- 
cution, and supp ementary ; storage wi 1 contain the • 
statistics correlated with each structural element. The 
standard presentation of this information would be a 
listing of the blocks with their exit linkages governing 
their order. 

Once the information has been collected, other out- 
put procedures can be used to meet special require- 
ments. For documentation of the program, it may be 
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desirable to present the block diagram in pictorial 
form, using the host computer to compute and print 
the diagram. When the program is being optimized by 
trial, it will not always be necessary to print out the 
entire listing, but only the more time-consuming 
elements. 

Detailed description of procedure 

If an instruction is a conditiona 1 branch, the first 
time its operation code is found in the instruction 
stream, it is recognized to have he potential for a 
different successor in some future execution and there- 
fore it is recorded as the “exit” of a block. Its successor 
of the moment is one “destination” and also an “entry” 
to another, or possibly the same, block. The basic 
record thus consists of three addresses, identifying the 
entry, exit, and one destination of the block. 

When a conditional branch identifies the next ad- 
dress as an entry to a block, this block may intersect 
some block already derived from the instruction stream . 
There are four possible relationships of a current block 
to blocks already traced out, as shown in this diagram 


previously Btored block 
current block possibilities 


In the first possibility, none of the addresses from 
the current entry, Nc through the current exit, Xc, 
will be found to match any previously stored entries 
or exits, Np or Xp; the block is therefore new and can 
be added to the store. 

A current entry may not be recognized, but may be 
followed eventually by an address which does match 
some previously stored entry. The address just previous 
to that matching Np becomes the current exit of a 
block, as shown in 2. above, and the block is recorded 
with Np as destination. The program will continue 
by repeating <Np Xp> , because Xc is not a branch. 

The destinatiop of a block may be to an entry already 
recorded, as shown in case 3. Assuming that no change 
of operation code has taken place the same exit must 
follow, and the block need not be recorded again unless 
the destination is different. Eventually, in the program’s 
execution only case 3 will be found. 

If a branch, conditional or unconditional, has led to 
a new entry within a block, as shown in case 4, this 
fact will not be known immedmtely. However, sooner 
or later an address will match the exit, Np, to signal 
the condition. The current block can be added to the 
store, but the previous block is intersected by it. 



Xp 

H 


Nc 

H 


In order to partition the intersected block discovered 
in this case, it is necessary to determine the address 
one location less than the current entry. This exit is 
not computable exactly when variable-length in- 
structions are being executed, but it might occur again 
in the instruction stream and be recognized because its 
successor matches the entry in question. To cause this 
to occur, a flag is added to the intersected block, re- 
moving it from use by the algorithm, so that if the 
block should be repeated from its original entry, the 
situation will resolve itself into case 2. 

The flagged block might include an initializing routine 
which is never repeated, and the block will contain 
time and status data which cannot be distributed to its 
partitions. Therefore, the flagged block is retained for 
the ultimate readout and presentation of results 

Special operations in the program, such as multiway 
branches, cause no difficulties to the operation of the 
algorithm when they are based on recognized operation 
codes. If the program changes an operation code to a 
branch, as mentioned in case 3 above, the algorithm 
must be altered to take into account some cases in 
addition to the four cases described. An algorithm 
which makes use of addresses only, and is thus un- 
affected by a changed operation code, has been worked 
out by the author but is not included here. 
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Block Diagramming Algorithm 

N - Entry address 
X - Exit address 
D - Destination address 
O - Operation code 
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INTRODUCTION 

The NASA Electronics Research Center (ERC) in 
Cambridge, Massachusetts, has undertaken a broad 
program to satisfy flight computer system requirements 
for future missions, including versatility and long term 
reliability. Specific attention to these requirements is 
necessary because flight qualified aerospace computers 
and even some still under development, have been 
designed for increased computational speed and 
arithmetic capability, but not for the long life reliability 
and application flexibility that will be required for 
future space missions. 1 ' 2 For example, the mean time 
between failure (MTBF) of available aerospace com- 
puters lies in the range of 2,000 to 5,000 hours, whereas 
long space missions will require an MTBF of 10 6 hours. 

Sseveral computer organizations have been described 
in the literature which include redundancy for in- 
creasing mission reliability, but still neglect appli- 
cations versatility. 3 * 4 Some non-spaceborne computers 
of the array or multiprocessor type are currently being 
developed. 6 ’ 6 These systems, although potentially capa- 
ble of meeting ERC’s versatility and reliability ob- 
jectives, lack design features for space applications 
(component reliability, weight, volume, radiation 
hardness, etc.). 

This paper describes the architecture of a modular 
computer which can be configured to operate as a 


number of parallel processors, with each segment or 
column solving an independent problem that may be 
different or identical. Each column in turn contains a 
number of blocks called modules, which may be con- 
figured so as to form patched columns, using modules 
from different physical locations; for example, a diag- 
onal (see Figure 1). This structure meets the high 
speed computational requirements for attitude control 
associated with strapdown systems, and also achieves 
the reliability required for long time mission success. 

The modular computer requirements have been de- 
rived through simulations which yielded speed, word 
length, and memory requirements. 

A breadboard model consisting of two columns has 
been built and is currently in the terminal stage of 
system checkout. Software is being developed con- 
currently with hardware. This Modular Computer 
Breadboard (MCB) will be used for experimenting with 
different structures in order to enhance the NASA 
ERC modular computer objective. The body of this 
paper describes the LSI implementations of the modular 
computer, with requirements and organization given 
in the following sections. 

The NASA modular computer requirements 7 * 

The functional design requirements can be character - 
* A summary is included here for easy reference. 
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ized by high probability of success over a short period 
for high speed computations and survival for long 
periods at low computation rates. 

The Modular Computer, as a potential component 
of a guidance and navigation subsystem of several 
potential space booster configurations, must be appli- 
cable to at least four distinct missions: the synchronous 
satellite, lunar orbiter, Mars orbiter, and Jupiter fly-by 
solar probe. Computer memory size, word length, and 
speed requirements for each phase of these four missions 
have been estimated by means of computer simulations. 
The object computer was assumed to have single- 
address and sequential operation. 

Figure 2 shows the computational requirements as a 
function of injection velocity accuracy. Next to relia- 
bility, computational speed is the most critical param- 
eter. Only one set of curves is shown for all missions 
since it has been assumed that tlie guidance computa- 


tional requirements up to and including injection are 
the same for all missions. The speed (instructions per 
second) axis represents equivalent additions per second 
at a rate of 1 multiply equals 6 adds. The memory 
requirements include approximately 1,400 words for 
executive and 10 operations, for a total of 12,800 words. 

In terms of physical parameters, it is estimated that 
radiation, temperature, and computer operability 
requirements represent the most cirtical environmental 
conditions which the modular computer must meet. 
The proposed trajectories could subject the spacecraft 
to 3 to 48 hours of 1-MeV electron flux of 10 9 e/cm 2 
sec and 80-MeV proton flux of 10 7 p/cm 2 sec. Repre- 
sentative calculations of anticipated ambient thermal 
environments clearly indicate that an environmental 
control system is needed. The mission time requirement 
for navigation varies from six hours for the synchronous 
satellite to 436 days for the Jupiter fly-by. These times 
pose stringent reliability requirements. 
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Figure 2 —Computational requirements for injection into parking orbit 


The modular computer architecture 

Design philosophy 8 

The most severe requirements in terms of speed and 
accuracy occur during boost. 7 Post injection computa- 
tional requirements are low and the accuracy of compu- 
tations is far less critical. Therefore, to satisfy the 
composite requirements a Modular Computer (MC) 
organization as shown in Figure 3 has been structured. 
Each column of the MC can satisfy the 1.5 X 10 6 in- 
structions/sec requirement. 

During boost, three columns of the modular computer 
operate concurrently in a triple modular redundant 
(TMR) mode, with majority voting at the outputs. 
After orbit injection, the TMR mode is terminated 
and the ensemble of modules is configured so that only 
one computer remains operating; the others are turned 
off to conserve power and improve reliability.* System 

* The failure rate of non-operating circuits is assumed to be 
lower than that for operating ones. 


interlocks are provided which insure that the on- 
computer performs correctly (within bounds). If this 
is not the case, the Configuration Assignment Unit 
(CAU) is triggered. It is the task of this unit to assemble 
at least one computer out of all the available modules. 

The availability of good modules is determined by 
means of hardware-software tests with interlocks. As 
may be seen from Figure 3, each of the computers has 
been separated into four functional modules: a Memory 
Unit, Control Unit, Arithmetic Unit, and an 10 Unit. 
The Configuration Assignment Unit (CAU) in con- 
junction with the CU, together with the Configuration 
Control Switches (CCS), can automatically reconfigure 
the ensemble so as to form an operating computer. 
Such a computer may consist of any combination of 
MU-i, CU- i, AU-i, IO-i. 

The breadboard version of the modular computer 
contains two columns. This is sufficient for the intended 
experiments : 

1. Determination of mission algorithms within 
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specified accuracy limitations and consistent 
with the intended application. 

2. The use of parallel processing to achieve higher 
effective computational speed. 

3. Automatic detection and isolation of the occur- 
rence of a computer module failure, and auto- 
matic reconfiguration to eliminate the effects of 
th e faulty element . 

Computer structure 

Although Figure 3 shows a tri-column configuration, 
the actual flight computer may require additional 
columns and some configuration adjustment in order 
to meet the mission time requirements. 9 

In general terms, the modular computer consists of : 

k — Configuration Assignment Units (CAU) 
one set of Configuration Control Switches (CCS) 
m — Control U nits (CU) 
n — Arithmetic Units (AU) 
p — Input Output Units (IOU) 
q — Memory Units (MU) 
r — Power Supply Systems 

The values of k through r are determined from 
reliability requirements and configuration alternatives. 
In the preliminary design, k = r = 1 and m = n = p 
= q = 3. This configuration will be adjusted as re- 
quired. 


The configuration assignment unit (CAU) 


The Configuration Assignment Unit controls the 
switches which interconnect the: various modules to 
produce the necessary computer or computers. The 
CAU monitors CU requests for changes in the com- 
puter’s configuration and, based on a predefined test, 
may accept or reject these requests. It determines if 
no operating “computer” exists, and then establishes 
new configurations until a working “computer” is 
assembled. The CAU contains registers which permit 
communication between control units. CU interrupts 
are generated in the CAU by means of Status and 
Mask registers. The system clotik is also located in 
the CAU. The primary tasks of the CAU are: 


1. To validate requests for change from a CU by 
monitoring the elapsed time and the result of a 
diagnostic, and then accepting and imple- 
menting the request. 

2. To connect all possible configurations one at a 
time until one operating: computer is found, 
based on diagnostics. 


3. To initiate an interrupt in a newly configured 
computer to start a diagnostic. 

4. To provide and monitor a counted delay of 
about 30 seconds which, if not reset in time, 
will be interpreted as the absence of a working 
computer, which will initiate two above. 

5. To maintain configuration and status infor- 
mation during a shut-down if power is main- 
tained to the CAU. When power is restored, 
the two previously stored configurations will be 
exercised first to locate an operating computer. 
If these fail, 2 above is initiated. 

6. To accept from the executive CU requests for 
changes in 10 configuration. 


Configuration control switches (CCS) 

As seen from Figure 3, the CCS’s provide a path 
between any module in a row with any module in rows 
immediately above and below. In addition, the switches 
provide for traffic between the CU and IOU modules. 
All paths are under the control of the CAU. 


Control unit (CU) and Arithmetic unit (AU) 

The Control Unit determines the sequence of oper- 
ations within the computer, which consists of one or 
more MU’s, one AU, and any applicable IOU; i.e., all 
computer memory, arithmetic, and input/output oper- 
ations are under the control of the CU. As is seen from 
Figure 3, the traditional ACP (Arithmetic and Control 
Processor) has been split into separate functions of 
CU and AU. This is done to enhance processing speed 
and long term reliability. Each unit has a set of 16 
temporary registers much like the multi-usage reg- 
isters of third generation computers, except that there 
are three index registers which are separate and distinct 
in addition to the temporary registers. The AU and 
CU operate concurrently. The AU accepts data and 
instructions from the coupled CU and executes these 
instructions under internal control, making the results 
available to the same CU. Two’s complement arithme- 
tic, both floating and fixed point, are included. 

The input output units (IOU) 

The Input Output Units (IOU) are of the direct 
memory access type, which provide cycle-stealing ac- 
cess for 10 transfers. Each IOU provides two input and 
two output channels. The IOU contains two registers 
which can be loaded by the CU. These registers hold 
the priority and normal operation control words. When 
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Figure 3 — Modular computer organization 


data for the priority channel is absent, the normal LSI implementation of the modular computer 
transaction is served. Overview 

The memory unit (MU) 

The Memory Unit (MU) receives, parity checks, and 
stores incoming data in the assigned address. The ad- 
dress is also checked for parity. At present, each memo- 
ry unit can store 4,096 words of 36 bits each. A read- 
restore cycle is completed in 1 microsecond. Each 
memory is addressable by any CU, as permitted by 
the CAU. Two CU's are not allowed to be associated 
with one MU. The CAU may permit a CU access to 
more than one memory. Memory access is through a 
combination of sequential and priority control. First 
access is assigned to data from the IOU, while second 
priority is assigned to the CU. 


Size, power, and reliability constraints demand that 
the modular computer be implemented with LSI 
circuits, but the question of how to achieve an LSI 
implementation remains. To date, several approaches 
to logic partitioning for LSI have been reported, ranging 
from the conventional approach, where partitioning is 
done after the logical equations have been written, to 
the “cellular” type approach, where a group of logical 
gates are structured to be programmed on the cell to 
form specific functions. 9 ,10 ,u ,12 

The conventional approach includes both manual 
and automatic partitioning. This approach appears 
undesirable for the modular computer implementation 
because the design process tends so be lengthened 18 and 
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the number of LSI chip types tends to increase, par- 
ticularly as applications are broadened outside of the 
computer proper. A small number of LSI chip types 
is an important factor towards achieving the very 
tight quality and process controls required for the 
realization of very low component failure rates. The 
latter is a must for long time mission reliability. 

In the cellular approach, the cell design is such that 
all combinations of n variables must be implementable 
in order for the cell to be of universal use. Proofs have 
been developed showing that such a cell can indeed 
form all functions of n variables. The cell, although a 
universal device, still requires tlje process of writing 
logic and determining which paths in the cell structure 
should be connected or cut (physically or logically) 
in order for the universal cell to assume the unique 
logic posture specified by the logic designer. 

A functionally organized set of building blocks with 
predetermined* logic interconnects has been chosen 
for the modular computer implementation. This set, 
called functional characters, tends to satisfy the re- 
quirements of a small number of LSI types. 

The set of characters, 10 in all, was selected through 
a pragmatic approach to logic partitioning. As for the 
cellular technique, the characters have predetermined 
logic interconnects but do not require restructuring of 
interconnections in order to achieve the logical design 
objective. The design process with; functional characters 
is analogous to programming using a compiler. The 
characters are analogous to compiler statements. The 
designer specifies inputs, outputs, and control for each 
character’s micro-operation. Micro-programming is 
used as the control structure. Three of the 10 characters 
comprise the micro-program store. Perhaps designing 
with pre-specified large functions without the utili- 
zation of Boolean equations mairks the greatest de- 
parture and contribution of the functional characters. 

No attempt is made to demonstrate that a character 
or the set can implement all combinations of n variables. 
All combinations are not required in order to build 
effective computing machines. The design philosophy 
permits the introduction of nejw characters if the 
existing ones are shown to be ineffective. 

The ten functional characters exist as logical blocks 
containing approximately 350 gates per block. These 
blocks can be subpartitioned into smaller blocks with 
fewer gates per block or chip, whereby several smaller 
blocks would compose a functional character (see Table 
V). A reduced-width set of functional characters has 
been breadboarded using conventional IC circuits. 
This demonstrated the modularity and versatility of 
the characters. 


The characters can be implemented with LSI circuits, 
using cellular or threshold logic, or any other appropri- 
ate technique. An overview of the characters is pre- 
sented here.** Statistics are given comparing the func- 
tional character design of the Modular Computer 
Breadboard with the implementation utilizing custom 
logic design and partitioning, as found in the imple- 
mented Modular Computer Breadboard (MCB). Re- 
grettably, there is no means for a one-to-one comparison 
using identical stages of MCB implementation. To 
the extent practical, the comparisons address the same 
system parameters. The comparisons assume that all 
cards of MCB containing IC’s have been converted to 
equivalent LSI chips. 

Description of the functional characters 

The functional character set is a group of logic 
arrays forming a self-sufficient family of building blocks 
that reduce computer design to a determination of 
character types and number, followed by micro- 
programming of the set. Ten character types have been 
shown to be sufficient for the building of both special 
purpose and stored program general purpose digital 
equipments. These characters are: 

G1 Register storage 

PI Scratch pad memory 

LI General logic 

L2 Arithmetic logic 

L3 Input/Output 

Ml Micromemory sequencer \ 

M2 Micro-instruction Register? Microprogram 

MM Micromemory array / memory 

P2 Up/Down counter 

P3 Switch 

Tab’e I shows the gates, pins, and gates/pin ratios for 
these functions. 

Characters of the same letter are logically grouped 
into a common unit, as illustrated in Figure 4. This 
arrangement extends the register count and word 
length. The complexity of logical operation can also 
be extended by the cascading of characters. Several 
microprogram strings can be executed simultaneously. 
The micro memory function was divided into three 

* The logic of the block is designed prior to the computer de- 
sign 

**More detailed discussion on the subject is found in paper 
by F. D. Erwin and J. F. McKevitt. of this Proceedings. 
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Figure 4 — Typical functional character configuration 


characters in order to provide for greater versatility. 
The array can be adapted to different size programs. 
The instruction register may be cascaded using two or 
more M2 characters, and still operate under a single 
sequencer control. 


TABLE I — Composition of the ten character types 
sufficient for building special purpose and stored 
program GP digital equipments 



Gates 

Pins 

G/P 

Gl General Register 

224 

62 

3.4 

PI Scratch Pad 

Depends on system architecture 


(8 X 16) bits/block 


LI Boolean 

274 

145 

1.8 

L2 Arithmetic 

250 

77 

3.3 

L3 I/O 

377 

149 

2.5 

Ml Sequencer 

348 

91 

3.8 

M2 Instruction 

323 

131 

2.5 

MM Array 

Depends 

on size of program 


2048 bits/block 


P2 Up/Down Counter 147 

81 

1.8 

P3 Switch 

210 

118 

1.8 


Functional character implementation of the 
modular computer breadboard (MCB) 

The functional character appears to have a broad 
range of applications. This was demonstrated in the 
study by implementing an A to D, DDA, and the 
modular computer. 16 For the purpose of evaluation, 
the breadboard version (MCB) was implemented using 
the functional characters. The MCB is a two column 
configuration of the modular computer. The functional 
and operational aspects of the MCB have been pre- 
served in the functional character implementation. 
However, the implementation detail was tailored to 
the functional character set. This includes the grouping 


of registers into memory arrays and complete micro- 
programming, which are not part of the MCB. 

Figure 5 shows the block diagram of the existing 
MCB. This diagram has been overlaid with the 
character implementation as shown in Figure 6. Note 
the P3 blocks of Figure 6 are equivalent to the switches 
(CCS) of Figure 5. In the block diagram form, the 
MCB implementation using functional characters is 
depicted as an assemblage of characters each under 
microprogram control. The microprogram resides in 
the micromemory, which consists of the MM, Ml, and 
M2 characters. The word length is determined by the 
number of juxtapositioned characters of the same type. 
In general, the characters are 8 bits wide. The PI 
character is 16 bits wide. The number of Gl or Pi 
rows identifies the number of registers of the P or G 
type. The G type operates directly under micro- 
program control, whereas the P type operates indirectly 
under microprogram control. The Gl character con- 
tains four registers for a 4 X 8-bit array. The PI charac- 
ter contains 16 registers for a 16 X 16-bit array. 

Figure 6 also shows the character content of each 
module adjacent to name of the module. The number 
to the left of the slash (/) is the total number of charac- 
ters used per module, regardless of type. These numbers 
represent the first microprogram pass referred to in 
Table III. The number to the right of the slash is the 
number of character types used in each module regard- 
less of the number of modules. Note that the number 
of characters is additive, whereas the number of charac- 
ter types is not ; the sum of the character types is 10. 

Evaluation of the functional character design 

Table II shows the comparison data of the functional 
character implementation versus the existing MCB 
implementation. As may be seen, in all aspects, except 
gates committed**, the functional character imple- 
mentation results in a significant improvement over the 
existing MCB design. The number of gates committed 
is 35 percent higher for the functional character ap- 
proach. In the LST area, the tradeoff will no doubt 
recognize the functional character approach as signifi- 
cantly superior. An increase of 35 percent in the number 
of gates committed is a small price to pay for the 
reduction in the number of chip types and pins. 

As will be shown later for reliability purposes, a 
small number of pins in the system is far more im- 


** “Committed” rather than “used” is the proper descriptor since 
some gates on the chip or conventional card are unused but yet 
they are committed by virtue of being part of the chip or card . 
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Figure 5— Block diagram of existing modular computer breadboard overlayed with the character 
| implementation shown in Figure 6. 


portant than a small number of gates, all other factors one representing the functional character technique 

being equal. As seen in Table III, the number of pins the other representing the conventional approach 

required for the MOB implementation is 2.6 times the where every MCB card containing X number of 1U s 

number required for the functional character imple- has been converted to an equivalent IC with the number 

mentation. of card terminals becoming the equivalent LSI package 

As the column heading shows, the comparison in pins. . 

Table II is made between two tSI implementations: The implementation with the functional characters 
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Figure 6 — Functional character implementation of MCB 


resulted iu a 35 percent greater throughput. This is 
because the functional character assumed a 32 percent 
faster gate. For equal gate delays the two implemen- 
tations would yield approximately equal throughput. 

The most significant point from a quality control 
point of view is that the entire computer was imple- 
mented with ten character types — three of these belong 
to the “micromemory” domain used for micropro- 


gramming of the computer modules. The micromemo- 
ry array (MM) is the storage element which contains 
the control information. If permanent memory is used, 
it may be necessary to generate the desired information 
content on a number of different chips. However, 
effort is being expended in industry towards producing 
electronically alterable, read only memory arrays. 17 
Progress to date shows that there is promise of being 
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TABLE II — Comparison of functional character 
implementation and existing MCB 
implementation 16 


Implemen- Func- MCB ; Percent 
tation tional Assumingjmprove- 


Item 

Charac- 
ter Im- 
plemen- 
tation 
(Units) 

Each 
Card Is 
An LSI 
Chip 

ment 

Over 

MCB 

1 Imple- 
menta- 
tion 

High/ 

Low 

Ratio 

Types 

Cards (LSI 

10 

23 

+56 

2.30 

Chips) 

206 

554 

+63 

2.70 

Pins Committed 
Gates 

18,200 

47,600 

+62 

2.62 

Committed 

47,200 

35,000 

-35 

1.35 

Gates/Pin 

2.6 

0.75 

+250 

3.47 


able to use only one array with identical metalization 
patterns. This array will be encoc|ed with the proper 
information content at the time of ufce. 

It is reasonable to project that ten characters and 
ten masks are sufficient to implement the MCB and 
the majority of digital equipments. Other types of 
equipment were implemented, including A to D and 
D to A conversion logic and a DDA. All designs 
utilized the same characters but different micro- 
programs. The efficiency of gate usage was best in the 
MCB implementation and worst in the DDA. 16 It is 
premature to conclude that a different character is 
required for a more efficient implementation of the 
DAA. The MCB design was optimized through remicro- 
programming, but this was not done with the DDA 
and A to D equipments. i 

Design with functional characters saves time. During 
a six month period, the entire MCB was designed, 
microprogrammed, and remicroprogrammed several 
times. This illustrates the ease and f speed of the design 
process. The improvements gained through micro- 
programming are demonstrated in Tables III and IY. 

Table III shows the improvements in terms of the 
number of characters and character types required for 
the two microprogram passes. The characters remained 
unchanged. In this comparison, the configuration of 
the MCB was identical with the presently imple- 
mented IC version. 

Further improvements were gained, as shown in 
Table IV, by restructuring the MCB with the appro- 


TABLE III — Effects of microprogram improvement 
on the functional character implementation 
of the MCB 



No. of Characters 
Used 

No. of Character 
Types UseD 


MICRO- MICRO- 

PROGRAM PASS PROGRAM PASS 

Unit 

First 

Subsequent 

First Subsequent 

MU 

ii 

7 

6 

5 

CAU 

39 

38 

7 

7 

CU 

38 

35 

9 

9 

AU 

25 

21 

7 

7 

Switches 

8 

8 

1 

1 

I/O 

17 

17 

6 

6 

Computer 
Total System 

229 

206 

10 

10 


TABLE IV — Effects of combining the 

AU and CU of MCB 

Parameter 

Functional 
Character 
Implementation 
of the Existing 
Configuration 

Same Except 
AU and CU Were 
Combined 

No. of Characters 
No. of Character 

229 

182 

Types 

Fixed Point 

10 

10 

Direct Add 

9.9 us 

4.2 us 

Fixed Point Add 
Fixed Point 

11.6 us 

6.4 us 

Subtract 

11.6 us 

6 .4 us 

Inclusive or 

11.5 us 

6 . 2 us 

Exclusive or 

1 1 . 5 us 

6.4 us 

Logical and 

11.5 us 

6.2 us 


priate remicroprogramming. The AU and CU were 
combined into one unit, eliminating some logic and the 
switch between them. This reimplementation was 
feasible with the functional character set due to the 
more general nature of the characters as contrasted 
with the custom implementation of the existing MCB. 
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Combining the AU and CU into one unit may affect 
the long term reliability. This and curiosity about the 
relative merits of multiprocessor structures, such as 
the Hughes H4400 (currently being built), vs. modular 
computers, such as the MCB, led Hughes to study 
factors affecting long term reliability. In this study, 
modules of equal complexity, with the exception of 
the switches, were assumed. The results are presented 
in Reference 9. 

Several interesting points are worth mentioning here: 

1. Multiprocessors have an improved short term 
reliability, but the long term reliability is de- 
graded somewhat. 

b. Different configurations, or organizations, signifi- 
cantly affect long term reliability. 

c. Component reliabilities (failure rate of the 
characters) markedly affect the mission relia- 
bility. 

d. The failure rates quoted for existing IC’s of 
10~ 8 failures per gate-hour will have to be sig- 
nificantly reduced in order for either the multi- 
processing or the modular computer organization 
to reach the desired long time mission reliability 
objectives. 

Circuit realization of the functional characters 

This section presents circuit considerations for the 
LSI realization of the functional characters. The circuits 
must not only reflect the correct logical functions but 
also, because of the potential space applications, 
satisfy the electrical, thermal, and mechanical con- 
straints. 

The circuit solutions are to be designed to reflect a 
set of NASA design guidelines that are intended to 
insure a high probability of mission success. These 
guidelines are: 

Gates per chip —About 100, no more than 150 

Circuit yield — 100% without yield enhance- 

ment 

Conductor spacing — 0.1 mil minimum 

Conductor width — Current density not to exceed 
10 6 amps/cm 2 

Metalization layers — No more than 2 

Circuit type — Bipolar TTL 

The 100 gate per chip function size limit reflects 


the 100 percent yield and TTL technology constraints. 
It is expected that LSI and TTL circuits containing 
about 100 gates will be producible with 100 percent 
yield. Other circuit technologies such as MOS may 
accommodate a larger number of gates per chip. 

As may be recalled from Table I, some functional 
characters require about 350 gates per function. The 
natural tendency would be to implement one character 
per chip. However, this is not an acceptable solution 
for TTL circuits in view of the above constraints. 
Therefore, the functional characters were subpar- 
titioned as shown in Table V. 

The intent of the table is not to select the optimal 
subpartition, but to enumerate some logical choices. 
The optimal choice will depend on assigned weightings 
for gates and pins per chip, as well as the other design 
constraints mentioned earlier. The table thus shows 
each character and the characters’ composition, using 
one or more custom or commercially available LSI/MSI 
chips. More than one subpartitioned chip is required 
to implement the functional character. The number of 
chips and chip types required is given in the second 
column as a descriptor and also in the sixth and seventh 
columns under “composite.” The columns under the 
“composite” heading state the total number of items 
required to implement one functional character. The 
columns under the first and second chip heading con- 
tain similar information on a per chip basis. 

A comparison of Tables I and V shows the following 
changes: 

1. The number of chip types is at least 20% 
greater than the number of characters; thus, 
paying a small penalty in terms of part number 
problems. 

2. The number of gates per chip dropped (approxi- 
mately by a factor of 0.5) and the number of 
pins remained about equal, resulting in an in- 
creased number of pins in the system by a 
factor of about 2. 

3. The total number of gates per function increased 
an insignificant amount. 

As is shown below, these changes tend in the wrong 
direction for obtaining improved MTBF’s of the modu- 
lar computer. As is seen from the above and Table V, 
the subpartitioned characters would require a greater 
number of bonds (pins) and will therefore operate at 
higher temperatures than the non-subpartitioned set. 
The temperature rise is due to the increased number 
of gates required and the higher current required due 
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TABLE V — Alternate schemes for sub-partitioning 




Composite 

1st Chip 

2nd Chip 

Character j 



G/P 

Chips/ 

Chip 

Gates/ 

Pins/ 

Ratio 

No. 



Gate/Pin 

No. 

Name 

Composition 

Gat$s 

Pins 

Ratio 

Character 

Types 

Chip 

Chip 

Gate/Pin 

Used 

Gates 

Pins 

Ratio 

Used 

G 1-Register 

2 custom chips, 
single type 

224 

62 

3.6 

2 

1 

112 

52 

2.2 

2 





Ll-Logic 

2 custom chips, 
single type 

274 

145 

1.9 

2 

1 

137 

138 

1.0 

2 





L2-Adder 

2 Identical 
custom chips 

258 

77 

3.4 

2 

1 

129 

60 

2.2 

2 






1 custom and 

1 commercial chip 

224 

77 

■ 1 

3.0 

2 

2 

117 

88 

1.3 

1 

111* 

43* 

2.6* 

1* 

L3 -Input/Output 

4 identical chips 

454 

150 ! 

3. 1 

4 

1 

114 

72 

1.6 

4 






2 identical chips 
with optional parity 

410 

149 

2.8 

3 

2 

150 

87 

1.7 

2 

110 

68 • 

1.8 

1 


Alternate Scheme 

2 chips + optional 
parity chip 

394 

149 

2.8 

3 

2 

150 

95 

1.6 

2 ! 

98 

66 

1.5 

1 


Optimal 3-chip 
configuration 

377 

149 

2.5 

3 

2 

129 

85 

1.5 

2 

119 

86 

1.4 

1 

Ml-Micromemory 

Sequencer 

3 chips-2 types 

358 ’ 

91 

3.9 

3 

2 

150 

92 

1.6 

1 

104 

85 

1.2 

2 

3 chips-2 types 

150 gates if I.C. 

348 

91 

3.8 

2 

2 

142 

73 

1.9 

1 

206 

73 

2.8 

1 

P2-Counter 

1 custom chip 

14$ 

81 

1.8 

1 

1 

147 

81 

1.8 

1 






1 custom and 2 
commercial chips 

16$ 

81 

2.0 

3 

2 

40* 

* 

* 

2 * 

! 

83 

82 

0.9 

1 

P3- Switch 

2 identical chips 

216 

118 

1.8 

2 

1 

105 

75 

1.4 

2 





M2-Micro 

Instruction 

Register 

3 chips-2 types 

323 

131 

2.5 

3 

2 

100 

51 

2.0 

2 

123 

89 

1.4 

1 


* Commercially available chip 


to a larger number of external gates.* More pins require 
more external gates to drive the capacitance of the 
external pins. Both factors, increased pins and higher 
temperature, increase the failure rate of the device 
and thus lower the probability of mission success. 

Reliability considerations requite a minimum number 
of bonds (pins) and a lowest junction temperature 
practicable. 18 Several other factprs affect reliability. 
These are either less influential on the operational 
failure rate, or on a relative basis do not affect the 
tradeoff. For example, the quality of the package’s 
hermetic seal may be an important factor in develop- 
ment and acceptance testing. B|it once a good seal 
has been established, it will remain good. Furthermore, 
the difficulty of making a good seal is proportionate 
to the lengths of seal interface. tThe latter in turn is 
a function of the number of pin^ per package, which 
for the cases in question is about the same. 


* LSI circuits are generally built with; tailored lower power in 
temal gates for driving low capacitance and limited fanout within 
the chip’s boundaries and higher power gates at the chip output 
in order to overcome the input output Capacitance and chip fan- 
out. 


Temperature is a very important consideration since 
the failure rate of the device increases about 1 .8 times 
per 25°C temperature rise. 18 

Within specific cooling capacity, circuits, and pack- 
aging technology, two factors affect the device’s temper- 
ature: 

a. The number of gates per system. 

b. The number of 10 package pins per system. 

For example, in the natural and subpartitioned 
functional characters (Tables I and V) the number of 
gates per system remains approximately constant. 
However, the number of pins nearly doubled for the sub - 
partitioned case. Typically, in TTL circuits the power 
dissipation of the subpartitioned implementation is 
expected to increase. Specifically, the dissipation is 
increased by a factor of 1.08. Using the data from Table 
II, the total number of functional character gates in 
the MCB is 47,200 and the number of pins is 18,200. 

Assuming a power dissipation p and 2p or more for 
internal and external gates, respectively, the power 
dissipation for the MCB is : 
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where 

M = total number of gates 

and 

N = total number of pins 

P for the subpartitioned implementation is 57,700 

= p (47,200+ 10,500). 

Using the same formula, the power dissipation for 
the functional character implementation (Table II) 
versus conventional MCB implementation is 53,300 p 
versus 50,900 p, respectively. 

Even though the number of gates is 35 percent 
greater for the functional character implementation, 
the power dissipation is about 5 percent greater than 
that of the MCB’s, were it implemented with LSFs 
representing present MCB cards. This 5 percent differ- 
ence will disappear in practice. The octual power 
difference relative to the present IC implementation 
would be in favor of the functional implementation. 

In addition to the number of pins causing increased 
power dissipation, which may be equated with in- 
creased failure rates, there are other reliability and 
cost penalties associated with an increased number of 
pins. These all result from bonding. Each pin requires 
two internal bonds (one to the metalization, the other 
to the pin). Each pin must in turn be fastened to some 
external holder (card, connector, wire, etc.). 

Every one of these junctions is a potential failure 
and a fabrication cost factor. Thus, the number of 
pins as a contributor to increased system failure rate 
manifests itself in several ways. Every effort must be 
made to keep the pin count low. 

The “ideal” LSI chip, assuming it could be built 
would contain the largest number of gates and use 
the lowest speed power product circuit. Figure 7 show T s 
the various circuits currently available and the speed- 
power-product lines (PL) 19 . Note that the “ideal” 
circuit for space applications would be located in the 
lower left corner of the figure. The ion implanted and 
complementary MOS circuits come closest to the 
“ideal” circuit. The shaded area shows the speed- 

**Each pin must require at least one external gate, and each 
external gate dissipates at least p more units of power. Typically, 
% of the pins are used for output; the others are used for inputs, 
power and ground. 


power coverage of the P channel ion implanted MOS 
(IMOS) . The area for the N channel IMOS is forecast 
to be below that shown for the P-IMOS. At the speed 
considered the complementary MOS would straddle the 
P and N areas. The complementary circuit is attractive 
as a compromise speed-power option. However, it 
requires about twice as many devices per circuit over 
single channel. Thus, a single chip would be unable to 
support a complete function, resulting in increased pins 
per system. This is undesirable, as pointed out earlier.* 

From this, we conclude that the ion implanted MOS 
type circuit (single channel, high speed, low power) 
is optimal for the functional character implementation 
of the MCB, barring producibility problems. It provides 
the desired density at 100 percent yield, lower power 
dissipation, and desired circuit producibility . 19 * 20 - 21 
There are not sufficient practical data to make a judg- 
ment. If the “ideal” circuit is not available, a meaning- 
ful system can be built using TTL circuitry for the 
functional character implementation. The penalty is 
increased power and pins required at a very significant 
gain of availability of proven circuit technology. 

CONCLUSIONS 

It has been demonstrated that digital equipments can 
be designed using pre-specified logical building blocks 
called functional characters. Once the logical design of 
the functional character has been accomplished, the 
system designer no longer needs to employ Boolean 
equations to specify the system. He needs only to 
specify the inputs and outputs of the characters and 
microprogram the sequence of their operations. The 
set of functional characters can be considered as 
standard and “universal” LSI chips that are sufficient 
to implement most digital equipments. Two desirable 
features of the characters are that the number of chip 
types and pins in the system are significantly reduced. 

It may be inferred that standard design automation 
programs which have as inputs Boolean statements or 
their equivalent will not be applicable as functional 
character design aids. Routing programs have the 
greatest potential of being useful. Simulation programs 
will have to operate at a macro level. A microprogram 
assembler is a desirable program. 

In order to obtain the required 10 5 hours between 
system failures, it will be necessary to improve the 
system configuration of the modular computer and to 
improve the basic circuit or module reliability. The 

* The ratio of power dissipation for internal and external gates 
is much greater for MOS at the desired speed, 
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Figure 7 - — Speed-power products of some bioplar and MOS circuits 


functional character implementation of the modular 
computer will readily allow configuration changes. The 
module content and overall system configuration can 
be readily changed. The characters improve the 
module’s MTBF because of the significant reduction 
in the number of pins. 

The reduced number of chip types facilitates quality 
control, thereby potentially improving the module’s 
MTBF. 

Any circuit or chip wiring technique can be used to 
implement the characters. Currently, the modular 
computer is planned to be implemented with TTL, 100 
percent yield LSI technology. Other circuits and 
technologies are being evaluated. 
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Project DARE: Differential Analyzer 
REplacement by on-line digital 
simulation 


by GRANINO A. KORN 

U niversity of Arizona 
Tuscon, Arizona 


INTRODUCTION 

While batch-processed applications of convenient, 
highly developed digital continuous-system simulation 
languages are now commonplace, 1 * 2 such systems do 
not provide the intimate man-machine intercourse 
cherished in analog/hybrid simulation. The DES-I 
system, 2 which combined a special simulation console 
and a digital plotter with an SDS 9300 (medium- 
sized) computer was, then, a pioneering effort, unfor- 
tunately abandoned by its manufacturer. The only 
commercially available interactive system appears 
to be the IBM CSMP 1130 system which, like its 
predecessor PACTOLUS, 2 can be programmed from 
a simple typewriter terminal. This is an interpreter 
system implemented on a small computer and thus 
yields relatively quite slow execution. 

The writer has felt quite strongly for some time 6 that 
digital on-line simulation is ready to go — we do have 
simple simulation-language programming, plus very 
reasonably priced, fast digital computers, plus new 
graphic displays. All that would seem to be needed was 
a system design which would combine these items 
(Table I), with a good deal of human-factors engineering 
to make the operator happy as well as efficient. Project 
DARE (Differential Analyzer REplacement), spon- 
sored by the National Science Foundation at the 
University of Arizona, is a continuing attempt to 
develop a series of such systems. 

Project DARE demonstrates all-digital on-line 
simulation of dynamical systems. Each DARE system 


adds a very convenient but still relatively inexpen- 
sive simulation console to a small or large digital 
computer and can replace conventional analog com- 
puters in many applications. System equations or 
block-statements and input data are entered and 
conveniently edited on a cathode-ray-tube typewriter. 
Solutions or phase-plane plots appear on a second 
cathode-ray-tube display; system parameters and 
initial conditions are readily changed for successive 
runs; displayed data can be stored for comparisons; 
programs and results may be printed and plotted for 
hard-copy report preparation; and automatic iterative 
operation is possible. With a reasonably fast digital 
computer, man-friachine interaction at the console 
is rather more comfortable than with even a modern 
analog/hybrid computer. 

DARE I is a flexible CSSL-type floating-point 
system permitting relatively slow computation with 
the PDP-9 computer. DARE II is a block-diagram- 
based system which trades fixed-point operation for 
relatively very high speed on the small PDP-9, per- 
mitting, for instance, real-time flight simulation. 
DARE III and DARE IV are only in the planning 
stage and will implement economical and fast floating- 
point simulation on a time-shared CDC 6400. 

A critical study of future possibilities indicates 
that DARE-type systems could permit flight simula- 
tions including 40 Hz frequencies by 1975, but that 
modern analog computers are still a hundred times 
faster. Actual present-day practical applications, how- 
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ever, employ really fast (and therefore relatively in- 
accurate) analog computation $o rarely that much 
analog simulation could well give way to the more 
accurate, convenient, and often more economical 
digital methods demonstrated by Project DARE. 

DARE I : An on-line C SSL-type system 

DARE I software, written for the PDP-9 by J. 
Goltz as a Ph.D. dissertation, 5 produces a complete 
floating-point simulation system!, including the basic 
monitor, editor, and loader used also by DARE II. 
DARE I source language is essentially similar to the 
SCI-sponsored CSSL. 1 Though basically equation- 
oriented, DARE I will also implement user-created 
analog or hybrid blocks as FORTRAN functions. 

TABLE I — A list of requirements 
for an on-line digital simulation system 

A useful on-line continuous system simulation 
system must provide for: 

1. Entry of system differential equations (in 
equation and/or block * statement form). 

2. Entry of data (system parameters, initial 
conditions, function tables, etc.). 

3. Entry of simulation parameters (frame 
time, communication interval or display 
sampling interval, maximum computation 
time, integration routine used, maximum 
tolerable error in variable-increment inte- 
gration routines, choice of variables for 
display) . 

4. Editing , modification, and correction of 
the above entries. 

5. Display of state variables vs. the inde- 
pendent variable (usually the time) and 
against each other (phfise-plane plots). 

6. Preparation of hard copij for reports in the 
form of printed tables, xy recorder plots, 
or strip-chart records. 

In addition, a sophisticated simulation system 
must permit “simulation studies,” viz.: 

7. Computations based on results from multiple 
differential-equation-solving runs (statis- 
tics, cross-plots). 

8. Iterative computation , Le., repeated runs 
with system parameters and/or initial 
conditions recomputed! on the basis of 
preceding runs for optimization, bound- 
ary-value problems) . 


DARE I employs the FORTRAN compiler supplied 
with the digital computer and will be described in 
detail in a separate paper. 5 

DARE I accepts system differential equations in first- 
order (state-equation) form. These equations are 
simply typed in FORTRAN notation on the screen 
of a CRT typewriter at the right of the DARE con- 
sole (Figure 1). An interactive CRT typewriter pro- 



Figure 1 — DARE simulation console for use with a PDP-9 or 
PDP-15 computer. Programs and data are entered, edited, and 
modified on the CRT typewriter at right. Up to four solution 
curves, or a phase-plane plot, are produced on-line on the output 
graphic CRT display at left. A simulation control panel under- 
neath the output display controls simulation and display, with 
special push-buttons producing hard copy of programs, data, and 
solutions when desired. The teletypewriter and plotter used for 
this purpose are not shown. 

Console switches (lower left) are sampled by the computer to 
provide control inputs: 

Method Switch : A rotary switch used to select the inte- 
gration routine. 

DT, TMAX , EM AX: 4-decade thumbwheel switches in an 
adapted FORTRAN format. 

The third decade reads from —5 through 0 to +5, and with 
the fourth decade indicates a power of 10. 

Elapsed Time: A strip of 12 lamps to indicate the progress 
of computation, and to reassure the user that the computer 
is actually operating when computation exceeds a few 
seconds. 

Sense Switches: 2 position switches for various functions, 
determined by program. 

Trace Finder : Pushbuttons to identify one of 5 traces on 
scope display-probably by momentarily blanking it out. 

Command Push-buttons (lower two rows) : 

Lighted pushbuttons, for purposes marked on buttons. 
“Type eqns,” “type data,” and “select display,” are indi- 
cators only, offering suggestions to the user from the computer. 
Such suggestions can also appear on the alphanumeric CRT 
display. 
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gram proceeds to ask for problem data and simulation 
parameters. Of the latter, the frame time DT, the 
maximum computing time TMAX, and also the error 
EMAX for variable-increment integration, can be 
entered either with the CRT keyboard or by console 
digiswitches, whichever the operator prefers. Console 
buttons can recall selected program or data pages 
to the CRT screen for editing, or cause them to be 
printed out for report preparation. 

As the differential-equation solution proceeds, all 
state-variable values are read onto DECtape once 
per “communication interval” 1 (typically every 10 
to 50 DT). Thus any selected state variable can be 
brought back for single or multiple displays and 
printout; it is possible to compare a current solution 
with a selected earlier solution display. Permanent 
graphic records are obtained with an xy recorder and 
a four-channel stripchart recorder connected to the 
display. 

The choice of integration routines for differential- 
equation solution has been discussed and rediscussed 
in many survey papers. 2 * 4 All DARE systems (like 
the better batch-processing systems 2 ) offer a choice of 
integration formulas. With the on-line systems, con- 
sole selection of integration routine and frame time 
(time increment DT) permits very convenient compari- 
son of different integration methods in terms of stored 
solution displays. 

The flexible and convenient DARE CRT Editor 
program 5 ’* permits overwriting and correction, inser- 
tion of text, and automatic search for lines containing 
selected strings. 

A SORT/EDIT program (precompiler) sorts the 
symbol string constituting the program and creates 
a FORTRAN differential-equation-solving program, 
which is then compiled and executed. After the first 
run, data such as system parameters and initial con- 
ditions may be changed on the CRT screen, and suc- 
cessive differential-equation solving-runs are obtained 
without recompilation. Iterative and statistical simu- 
lation studies can be programmed with FORTRAN 
statements. 5 

A new homemade graphic display 7 associated with 
our DARE console displays up to four variables against 
time, or selected phase-plane plots. The display uses 
one dual 9-bit (18-bit) word per display point to save 
memory and refresh time, can generate line segments 
for curve interpolation, and shares the processor 
memory through a standard PDP-9 data channel. 
This permits fast display refreshing with a minimum 
of time- wasting instructions. 


DARE II: A fast block-macro system with 
an efficient precompiler 

The DARE I system demonstrates the convenience 
and power of a scale-factor-free, floating-point, equa- 
tion-oriented, on-line simulation at relatively low 
computing speed. But we also wanted to demonstrate a 
much faster on-line simulation system , which would 
permit true real-time flight simulation , still using the 
same small and inexpensive digital computer. With 
the PDP-9, this meant giving up floating-point opera- 
tion. DARE II machine equations must be scaled 
(much like those in analog computers) between — 1 and 
1 machine unit; with the PDP-9, ones-complement 
coding is employed. Overloads are detected and dis- 
played by a special subroutine. 

To provide high execution speed, DARE II uses 
the PDP-9 macro-assembler to create macros corre- 
sponding to analog computing blocks, an approach 
first used by Gaskill and McKnight in their batch- 
processed DAS system on the IBM 7090. 2 Our system 
permits especially convenient block programming, 
with each block named by type and by the actual 
output-variable name. The example of Figure 2 is 
represented by 

SUM FI, S1DOT, S2DOT 

COS COSA, A (1) 

MULT S1DOT, COSA, RDOT 

where the first argument of each block-macro represents 
the block output . Note the convenient mnemonics used. 

DARE II block-statements and data are entered 
on the dual-CRT console used also with DARE I and 
can be edited, modified, and printed out with the aid 
of the same string-processing editor. 6 DARE II 

simulations of many small systems (second to sixth 
order) are, however, so fast that repetitive simulation 
and display at two to 20 computer runs per second is 
possible. Keyboard entry of parameters is then too 



Figure 2 — A block diagram 
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slow for CRT demonstration, of parameter-change 
effects, and a “diddle knob” or joystick permitting 
rapid changes of a keyboard-addressed parameter will 
be added. The knob or joystick will control incremen- 
tation of an up-down counter holding the parameter 
value. 

DARE II software incorporates substantial improve- 
ments over the DAS system. Block-macros may be 
typed in any order. An optimizing precompiler sorts 
statements like those in our example (1) before as- 
sembly, so that each block of the sorted program can 
operate on already computed quantities : 

COS COSA, A 

MULT SIDOT, COSA, RDOT (2) 

SUM FI, SIDOT, S2DOT 

This will then permit, say, integration of the out- 
put FI. DARE II next employs conditional assembly 10 
to completely eliminate the assembly of code for redundant 
store-fetch pairs corresponding to outputs and inputs of 
interconnected blocks. Thus, the first macro COS COSA, 
A in (2) would ordinarily end with 

STORE COSA (3) 


while the second macro MULT SIDOT, COS A, RDOT 
would start with 

FETCH COSA (4) 

DARE II automatically cancels the redundant 
pair of instructions (3), (4), although (3) would be 
kept if it were needed elsewhere in the program. The 
pair 

STORE SIDOT, FETCH SIDOT 

will be similarly cancelled, unless SIDOT is needed 
elsewhere. The DARE II precompiler program, is specif- 
ically designed to permit elimination of as many track- 
store pairs as reasonably possible . In addition, condi- 
tional assembly also eliminates code for unused multi- 
input-summer inputs and similar unused options. 
As a result , DARE II produces code which is essentially 
as efficient as well-written PDP-9 machine-language 
code and permits relatively very fast execution (Table II) . 
If core storage is scarce, DARE II block macros can 
be subroutine calls to save core at the expense of some 
computing time. 

Although the basic PDP-9 instruction set is quite 
limited (no byte manipulation, spare registers, or add- 


TABLE II — Estimated computation times for a typical aerospace -vehicle simulation 
(TIMES are in //sec except as noted) 

OPERATION NUMBER DARE I DARE II DARE III/IV 197X 

REQUIRED PDP-9/FORTRAN PDP-9/Macro-assembler CDC 6400 System 

(Floating-point) (Fixed-point) (Floating-point) (Floating-point) 


X + Y + Z 

XY 

AX 

F(X) 

SINX or COS X 

100 

80 

60 

8 

10 

X1000 = 100,000 
X700 = 56,000 
X700 = 42,000 
X4000 = 32,000 
X600 = 60,000 

X5 = 500 
X24 = 1920 
X21 = 1260 
X52 = 416 
X60 = 6002 

X3.4 = 340 
X7 = 280 
X7 = 420 
X80 = 640 
X100 = 1000 

X0.2 = 20 
XL 2 = 96 
X1.2 = 72 
X10 = 80 
X15 = 150 

TOTAL— ONE 






DERIVATIVE 
EVALUATION 
Two Derivative 


290 msec 

4.7 msec 

2.7 msec 

0.46 msec 

Evaluations 


580 msec 

9.4 msec 

5.4 msec 

0.9 msec 

RK2 Integration 

12 

X3000 = 36,000 

X120 r 1440 

X25 = 300 

X4 = 48 

Total Frame 

Time DT 

Max. Frequency 


616 msec 

11 msec 

5.7 msec 

1.4 msec 

at 25 

Frames/cycle 


0.07 Hz 

4 Hz 

7 Hz 

30 Hz 
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into-memory), many analog-computer blocks can be 
emulated quite nicely. As an example, a single-variable 
function with 256 uniformly spaced breakpoints can 
be formed by table lookup and interpolation in 50 
Msec, and a two-variable function with 16 X 16 
breakpoints can be formed in 120 Msec. 9 It is also readily 
possible to add to the DARE II macro-block reper- 
toire; one can, for instance, create blocks which pre- 
cisely correspond to the computing elements of any 
given analog computer. 

Like DARE I, DARE II offers a choice of integra- 
tion routines. Because PDP-9 lacks true index registers, 
the second-order Runge-Kutta routine 4 

*+ x x = *x + y 2 { k, + k 2 ) 

Ki = DT F(*X, k DT) (5) 

K 2 = DT F[*X + K 1? (k + 1) DT] 

is probably the most useful, although it requires two 
evaluations of the derivative F(X, T) at each inte- 
gration step. To implement Eq. (5), our program does 
not first evaluate all n K/s and then proceed to add 
half of each to its *X, as might be done with a real 
index register. The program instead computes each 
*X + y 2 K x and *X + Ki before the next Ki is evalu- 
ated. When this is finished for all X, the program sets a 
tally switch to mark the second part of the Runge-Kutta 
routine, increments the independent variable, and 
uses the k X. + Ki to produce the K 2 and the fc + x X 
as each derivative is computed. All integrand accumu- 
lation is done in double precision to reduce roundoff- 
error effects. 

With suitable interrupts from a real-time clock, a 
DARE II simulation could be readily linked to a 
hybrid-computer setup and/or to real system hardware 
(autopilot, operator positions). Note, in this connec- 
tion, that the macro-assembler system would cir- 
cumvent the reentrancy problems usually encountered 
in attempts to service multiple system interrupts with 
FORTRAN programs. 3 

A look into the future: DARE III and DARE IV 

The DARE I and DARE II systems are expected 
to be completed in 1969. A useful and readily feasible 
next step could employ a modern 24 to 36 bit machine 
somewhat larger than our PDP-9 (e.g., SEL 840B, 
SDS Sigma 5, DEC PDP-10) to speed DARE I exe- 
cution, or to add floating-point capability to DARE II. 
Such a system would cost between $120,000 and 
$200,000, which still matches the cost of a comparable 
analog-hybrid computer. Far more interesting from 


the point of view of economy as well as computing 
speed, however, is the possibility of timesharing a 
substantially larger central digital computer, such as 
a CDC 6400. In fact, economical operation of even 
a medium-sized digital machine mainly intended for 
simulation should provide for time sharing with a 
“background” batch-processing program. 

Our proposals for follow-on projects , then , envisage 
implementation of DARE I- and DARE II-like simu- 
lation systems with the University’s CDC 6400 , using 
the existing PDP-9 /console combination as a remote 
user’s station, 6400 activity would be restricted to 
very fast and efficient compiling and execution of 
differential-equation-solving programs, while the string- 
processing CRT editor, data entry and display, and 
also some iterative and statistics routines in slow 
simulations, would be performed by the small proces- 
sor associated with the user’s console. It is interesting 
to note that the simulation programs and data sent 
to the central computer involve only character strings 
transmitted at type-in rates. Alphanumerical data 
from the central computer do not require much higher 
rates; extensive numerical tables could be line-printed 
at the central installation. Each DARE CRT display, 
which is refreshed by the console processor, involves 
at most 2400 9-bit data samples. For typical 10 sec 
flight simulations, this would require transmission of 
21,600 bits every 10 sec, or less than 2500 bits/second, 
so that a telephone line would do. Such operation is thus 
ideally suitable for remote time-sharing, provided 
that the 10-second-plus-overhead computer runs can 
be made available without excessive delays. 

Based on initial DARE II experience, smaller simu- 
lation problems would be solved much more rapidly, 
say in 0.1 sec of central-processor time. Repetitive 
console displays demonstrating parameter-change ef- 
fects would not be possible with reasonable data-trans- 
mission rates (nor would many such demonstrations 
be economically feasible) ! Our proposed time-sharing 
scheme is, however, ideally suited to fast iterative 
simulation or statistics-taking by the central processor 
In this type of operation, only successive criterion- 
function values, accumulated statistics, or similar 
numbers , need to be transmitted and displayed during 
the iteration runs, and low transmission rates would 
again suffice. 

In a console simulation system specifically designed 
for remote time sharing, our PDP-9 is really unneces- 
sarily elaborate and could very effectively be replaced 
by the less costly 8K PDP-15, with DECtape but 
without extended arithmetic. Such a system, including 
very reasonable display facilities, would cost well 
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under $50,000. An even less expensive system could 
be readily based on an even smaller 12- to 16-bit com- 
puter. This would save another $10,000; but the 18- 
bit word length of the PDP-15 is especially efficient 
for display-refreshing purposes and adds to the stand- 
alone capabilities of the console. Note, in this con- 
nection, that our own PDP-9-based console could 
employ DARE I for complete problem debugging 
before ever using CDC 6400 time.: 

With the large central computer and its relatively 
efficient compiler available, the proposed DARE III 
and DARE IV systems corresponding the the FOR- 
TRAN-based DARE I and the assembler-based DARE 
II, may well merge into each other. The multiple 
indexing needed for efficient implementation of inte- 
gration routines may well be done best by the CDC 
6400 FORTRAN compiler, while derivative computa- 
tions would probably still be executed more efficiently 
by an assembler-based system employing conditional 
assembly, as in the DARE II scheme. 

Digital vs. analog /hybrid simulation'. Computing -speed 

considerations 

Table II lists detailed estimates for various digital 
computation times required in a typical medium- 
sized aerospace simulation. Our example involves 12 
state-variable-derivative integrations, 100 three- 
term additions, 140 products, and 18 functions of one 
variable. The DARE I and DARE II systems are 
implemented on a Digital Equipment Corporation 
PDP-9 (one ^sec cycle time). This machine was chosen 
because it has an 18-bit rather than a 16-bit word 
length, although some of the newer 16-bit machines 
have much better instruction sets. The PDP-9 FOR- 



Figure 3 — DARE console in operation with the PDP-9 


TRAN compiler appears to be designed mainly to 
save core storage and produces relatively very slow 
execution. At a reasonably conservative 25 frames 
(time increments DT) per period, the resulting 616- 
msec frame time for our aerospace simulation would 
permit the DARE I system to produce sinusoidal 
oscillations at 0.07 Hz. Speedwise, we see that the only 
differential analyzer our DARE I system replaces 
is an old-fashioned Bush or General Electric mechanical 
differential analyzer! 

A notable and inexpensive improvement in this 
situation is afforded by the fact that several PDP-9- 
sized digital computers are already available with 
hardware floating-point arithmetic. No such option 
is available with the PDP-9, but we ourselves have 
designed a current-mode logic, floating-point arith- 
metic unit for the PDP-9 which, if and when installed, 
would yield a speed improvement by a factor of at 
least 15 for the DARE I system, so that our simulated 
aerospace vehicle could wiggle at about 1 Hz, floating- 
point. 

Our block-oriented DARE II system, also running 
on the PDP-9, was specifically designed to demonstrate 
relatively high-speed, real-time flight simulation on 
the inexpensive computer. The price paid for this is 
fixed-point operation, but DARE IPs efficient execu- 
tion and 1 1 -msec frame time permits about 4 Hz in the 
aerospace-simulation example. 

An improved 18- to 24-bit stand-alone computer 
of the future could probably produce comparable 
floating-point simulation at 4 Hz. As we have noted, 
though, the DARE III/IV systems will implement 
the economically much more important goal of time- 
shared operation with a large central digital computer, 
in this case the CDC 6400. As we have seen, very 
efficient and still relatively machine-independent exe- 
cution will be obtained by FORTRAN integration 
and macro-assembler implementation of derivative 
computations, although many operators may prefer 
an entirely equation-oriented approach. In either 
case, Table I indicates estimated frame times of the order 
of 5.7 msec , thus permitting about 7 Hz operation at 
25 frames per cycle. Note that this system would provide 
floating-point aerospace-vehicle simulation in real time. 

The last column of Table II extrapolates the DARE 
III/IV system to a hypothetical 1970X digital com- 
puter permitting an approximately fivefold increase 
in computing speed through faster hardware and/or 
multiprocessing, instruction look-ahead, or hard-wired 
subroutines. This is in no sense a way-out extrapolation , 
since digital-computer projects now on the drawing 
boards already plan for a fifty -fold speed increase . Proba- 
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bly the most time to be gained in simulation calcu- 
lations would be through the availability of fast scratch- 
pad memories or multiple registers, which would per- 
mit derivative computations with as few core-memory 
references as possible; this will already be approximated 
in the assembler version of our CDC-6400 simulation 
program. Additional computing bandwidth would 
readily be obtained with computer systems employing 
parallel multiple processors, which would fit nicely 
into differential-equation solving schemes. Note, how- 
ever, that no manufacturer of large digital computers 
would even consider a special design for continuous 
system simulation, so that all improvements must 
make, as it were, incidental usage of developments 
in large-scale scientific and business computers. 

Let us now consider the computing-speed situation 
on the analog/hybrid computer side. One or two ana- 
log computers available for sale in 1970 will offer not 
only 0.02 percent of half-scale static accuracy, but 
also 0.1 percent of half-scale error in linear compu- 
tations at frequencies up to 1 KHz; multiplication 
and function generation are somewhat less accurate. 
In applications where such component accuracies 
suffice, even existing analog computers are thus seen 
to have a 20:1 speed advantage over the fastest 
digital-simulation systems. This bandwidth advantage 
is moreover , not likely to decrease within the next ten 
years ; since 1965, improved db 10- V hybrid computers 
developed in our laboratory have operated with errors 
below 0.2 percent for linear and one percent for non- 
linear operations up to 10 KHz , at perfectly reasonable 
cost. 11 - 12 

Digital versus analog /hybrid : Economics 

Our DARE system is implemented on about $90,000 
worth of PDP-9 and simulation console; another 
$25,000 could be very advantageously spent on a 
disk to speed compilation. When implementing the 
fixed-point DARE II language, our stand-alone system 
is roughly comparable to a modest 150-amplifier hy- 
brid computer of 1960 vintage, say, an Electronic 
Associates 231-R together with a small digital com- 
puter used for potentiometer setup, static checking, 
and some function generation. 

At a more or less comparable price, the on-line 
digital system is incomparably more convenient to 
program, check out, and operate (this is, of course, 
doubly true of the floating-point system). We also 
have, of course, all the possibilities of the 16K PDP-9 
with dual display and can produce floating-point check 
solutions with DARE I. 

Our PDP-9 installation is, however, mainly intended 


as a demonstration. A more useful stand-alone instal- 
lation, based perhaps on the SDS Sigma 5, would 
roughly double our cost, but would permit real-time 
floating-point flight simulation, plus some foreground- 
background time sharing. Although such a system 
would be economically competitive with a 1970 ana- 
log/hybrid computer in many applications, the full 
economic potential of on-line digital simulation will be 
realized only in a time-sharing system. The tremendous 
advantage of the time-sharing system is, simply, 
that the central processor is free for other business 
while the simulation user looks at his console-refreshed 
display, or simply scratches himself. We have already 
seen that the communication requirements for time- 
shared simulation are quite small. 

I believe that the foregoing considerations clearly 
indicate the area of future analog/hybrid vs. digital 
simulation competition. In applications where analog/ 
hybrid and digital simulation systems compete at equal 
computing speeds, i.e., in most real-time or “slow” 
simulation , the new digital systems will win overwhelming- 
ly both on economic and on human-engineering grounds. 
Since, on the other hand, reasonably complex nonlinear 
digital simulations will not be able to run at frequencies 
much in excess of 100 Hz, faster simulation will still 
belong on analog /hybrid computers. 

A crucial question confronting the simulation com- 
munity (and specifically the analog-computer industry) 
is, then, this: where, and how large, are the application 
areas of really fast analog /hybrid computation ? The 
most immediately important would seem to be: 

1. Parameter and functional optimization, including 
trajectory optimization. 

2. Random-process simulation, including optimi- 
zation of statistics, communication-system sim- 
ulation, and parameter-tolerance studies. 

3. Solution of partial differential equations, in- 
cluding Techniques requiring multiplexing of 
analog computing elements. 

It is in precisely these applications that the very 
large number of computer runs needed may give the 
analog/hybrid computer a measure of economic ad- 
vantage even over digital batch processing. Even 
here, only important and frequent applications could 
tilt the balance away from time-shared digital simu- 
lation, which saves much analog-computer scaling, 
setup, checkout, and “head-scratching” time, not to 
speak of computer amortization. Cost estimates for 
different simulation methods sometimes omit these 
“hidden” costs. 

I w r onder, finally, how much practical high-speed 
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analog/hybrid computation is really done in the aero- 
space, chemical and nuclear-energy industries, which 
are, at this time, the principal consumers of continuous- 
system simulation. Our own laboratory’s work on the 
design and applications of very fast analog/hybrid 
computers, 11 ' 12 for instance, has always elicited much 
polite interest, but very little imitation. By contrast, 
much current aerospace work involves “slow” or real- 
time hybrid simulation of aerospace systems, with the 
digital computer doing housekeeping functions such 
as static checking, plus function generation and, per- 
haps, some accurate trajectory integration. The re- 
sulting accuracy and software problems combine all 
the worst features of both analog and digital compu- 
tation; the main reason for employing hybrid simula- 
tion at all is either the existence of actual hardware 
in the loop or some 20- to 50-Hz components due to 
hydraulic servos and/or aeroel$sticity. This type of 
hybrid simulation can be swallowed by future on-line 
digital systems like Jonah by the whale. For the 1970s, 
the simulation community would be well advised to 
include on-time digital simulation in its planning, 
together with some careful reconsideration of faster 
analog/hybrid techniques. 
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MOBSSL-UAF— An augmented block 
structured continuous system simulation 
language for digital and hybrid 
computers 

by M. J. MERRITT and D. S. MILLER 

USC School of Engineering 
Los Angeles, California 


INTRODUCTION 

The motivation for the development of digital simu- 
lation languages may be seen by tracing the thoughts 
of two widely different people preparing to analyze a 
continuous dynamic system. Both are experienced 
engineers and mathematicians, but the first is a novice 
programmer with little or no FORTRAN experience. 
Both have access to one or more digital computers. 
The novice's thoughts might be as follows: “I do not 
know FORTRAN and I'm not really interested in 
learning it just to solve this problem. I have heard 
that digital continuous simulation languages are simple 
and easy to use. I'll try one". The experienced program- 
mer, on the other hand, might think, “I only need a 
few quick solutions, why bother with a FORTRAN 
program. I'll use a simulation language for conven- 
ience." 

Clancy and Fineberg, 1 in 1965, compiled a compre- 
hensive list of some 31 simulation languages. One of 
these would fit the needs as well as the computer of 
both individuals. The novice is looking for a simple 
easy to use language. The experienced programmer is 
looking for one that compiles and runs efficiently while 
providing as much flexibility and convenience as 
possible. Since none of the presently available languages 
achieve the same running efficiency as a FORTRAN 
program written specifically to solve the same problem 


its conveniences must weigh heavily in the program- 
mer's mind. 

If a language is to satisfy the needs of these, as well 
as a broad spectrum of users in between, then it must 
possess the following characteristics : 

1 . It must be easy to learn. 

2. Its language statements must be simple and 
easy to interpret. 

3. It should not require any knowledge of FOR- 
TRAN. 

4. It should allow on-line interaction during both 
problem preparation and problem execution. 

5. The language should contain sufficient compu- 
tational control, and input/output elements so 
that only exceptionally complex tasks require 
FORTRAN or other non-simulation language 
statements. 

Of the widely distributed languages, PACTOLUS 2 
and IBM 1130 CSMP 3 come the closest to meeting 
these requirements. Unfortunately, they lack many 
necessary computational and control functions. The 
popular MIDAS 4 language is not interactive, while 
MIMIC, DSL 90 and 360 CSMP 5 are difficult to learn 
and very FORTRAN oriented. 

All of these requirements may be met by combining 
two things: a computer graphics terminal, and an 
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augmented block structured simulation language. The 
graphics terminal for its interactive communication 
abilities and the block structured simulation language 
because of its simple language statements. Further, 
the graphics terminals ability to display large quantities 
of instructional and reference information quickly, 
allows it to guide the new programmer through each 
step of the problem preparation. 

A block structured language may be visualized as 
a collection of input-output boxes (see Table 1), each 
of which carries out a basic mathematical operation. 
The user’s inputs, the language statements, describe 
the way in which these pre-defined functional blocks 
are to be inter-connected. A typical language statement 
might be: 54, M, 1, 7 which might mean: the output of 
the block element designated as #54 is the product of 
the outputs of the block elements designated # 1 and 
#7. The advantage of block structured language 
(MIDAS PACTOLUS, 1130 CSMP) lie in the simplicit 
of their language statements .Their major disadvantage 
is their rigidity, i.e., the u&r is restricted to those oper- 
ations which may be mechanized with the available 
mathematical and control operations. This disad- 
vantage may be overcome by constructing process 
oriented block elements which cause higher order 
mathematical operations to be carried out. The Gra- 
dient Processor and Disk Input/Output block elements, 
described below, are two such elements. 

TheMOBSSL language 

MOBSSL-UAF, which stands for Merritt and 
Miller’s Own Block structured Stimulation Language— 
Unpronounceable Acronym For, is a descendent of 
MIDAS through PACTOLUS and IBM 1130 CSMP 
It differs from its antecedents in the following ways : 

1. Continuous and iterative gradient modeling 
and optimization procedures are performed by a 
Gradient Processor block element. 

2. Analog to Digital and Digital to Analog con- 
version block elements facilitate closed loop 
hybrid computation, On-line interaction and 
control of analog plotting devices: x-y plotters, 
stripchart recorders, memoscopes and oscillo- 
scopes. 

3. A Disk output block element allows up to 10 
block outputs to be written in a pre-defined 
disk data set. A Disk Input block element reads 
up to 10 inputs from a pre-defined data set. 
Utility subroutines allow these data sets to be 
referenced by FORTRAN programs . 


4. Iterative and parametric computations are 
facilitated by allowing control cards to specify a 
SIMULATION MODE. When a solution is 
completed, the SIMULATION MODE deter- 
mines which of the following is to occur : 

STOP — terminates the job. 

PCHG — read data cards and modify param- 
eters and initial conditions ac- 
cordingly. The last data card 
specifies the SIMULATION 
MODE for the next solution 
which is begun immediately. 

RUN — begin a new solution immediately. 

Successive solutions may be 
modified by on-line control or the 
gradient and iterative block ele- 
ments. This mode contains no 
exit and must be terminated by 
operator intervention or by forcing 
an error exit, i.e., take the square 
root of a negative number. 

Process oriented block elements, like the Gradient 
Processor and the Disk Input/Output blocks, make it 
possible for unsophisticated programmers to study 
complex dynamic systems, modeling and optimization 
problems, and exercise on-line control without first 
learning the FORTRAN language. 

The communication and interactive features of 
MOBSSL, the Hybrid block elements and the SIMU- 
LATION MODE, were dictated by the computational 
facilities of the System Simulation Laboratory at the 
University of Southern California. In this laboratory, 
each user receives ten minutes of computer time on a 
first come, first served 'programmer present basis. This 
period is too short to encourage the use of the console 
typewriter for communication purposes. Instead, the 
user may read pre-planned parameter changes from 
punched cards, or operate control switches and po- 
tentiometers connected to the Hybrid block elements. 
The effects of these changes are observed in the line 
printer listings or on analog displays operated by the 
Hybrid elements. 

The gradient processor 

Optimization and modeling of synamic dystems may 
be re-formulated as a search for the extrema of a 
scalar functional of a vector with free parameters. 



TABLE I 


ELEMENT TYPE 

MOBSSL 

TYPE 

CODE 

BLOCK 

SYMBO 

BANG -BANG 

B 

e i- 

DEAD SPACE 



D 

e ,~ 

EXPONENT 

E 

I 
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DESCRIPTION 8 COMMENTS 



W P 2 -°° <e .- P 2 

e =0 P<e<P 
c o r 2 c i i 

e 0 = e, -P, P, < e,<+oo 

P, > 0 P 2 < 0 



e Q = e {P i e i + P 2 e 2 + P 3 e 3 ) 



e o s p . e . 





e > 0 


e 0 = P, ^ (e ( *P 2 e 2+ P 3 e 3 )dt 
e e o^o^ ‘ ^ 

Maximum no. of Integrators = 75 
Minimum no. of Integrators = I 
Fourth order Runge-Kutta 


Random Number Generator 
Generates random number between + I 
-l<e Q < + l 
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TABLE I— Definition of MOBSSL elements 


MOBSSL 

ELEMENT TYPE TYPE 
CODE 


BLOCK DIAGRAM 
SYMBOL 


DESCRIPTION & COMMENTS 


LIMITER 


MAGNITUDE 


NEGATIVE 

CLIPPER 


OFFSET 


POSITIVE 

CLIPPER 


QUIT 


RELAY 


^STORAGE 



Slope -I 
— ^e, 


e O = P 2 

" OO < 

e . <P 2 

e o = e . 

p c e 
r 2“ e l 

, <F ; 

e o= P , 

P <e 
r i e i 

< +00 


P > P 
i 2 


/ .. 

e =lel 

i 

0 1 

'' / 

e Q = 0 — 00 < e <0 

/ . 

e. = e. 0 < e^+oo 

e i 

U 1 1 


e i >e 2 =>QUIT 
-~ 2 o^ & 7 ^ QUIT ~ e ' e<e=7^>QUIT 

(TERMINATE 1 2 ' 

RUN) 

Quit element terminates the run at the 
end of the DT step in progress 


V e 3 

e O =e 2 


— oo <: e<0 
0 < e < + oo 



e, t+a SPDT u c 

3 Unilateral Relay 
°(0~) 


Enables block outputs from end (TF) of a 
given run (determined by P 2 ) to be 

stored, that is, continually available for 
the succeeding P 3 runs 
e Q (run I ) = P, 

e 0 [run(n+l)] = e ( [run(n)].+ e 2 [run(n)]+ F 

ejrunln)] TF if (n-P 2 ) M0DP 3 =0 
e 0 [run(n+l)]= e 0 [run(n)] if (n-P 2 ) MODP 3 > 0 





















MOBSSL-UAF 259 


TABLE I — Definition of MOBSSL elements 


ELEMENT TYPE TYPE 
CODE 


M?®§ SL BLOCK DIAGRAM 


SYMBOL 


DESCRIPTION 8 COMMENTS 


TIME PULSE T 
GENERATOR 


UNIT DELAY 


MULTIPLIER 


* 

WYE 


ZERO-ORDER 

HOLD 


SUMMER 



* 

VACUOUS 


WEIGHTED W 
SUMMER 



Generates impulse train of unit amplitude 
and period P, which starts when e,^0 
(to delay start of pulse train keep e, 
negative ). 



e 0 <t) = P, 

e 0 (t) = e 0 (t- At) 


t=0 Max. no. of unit 
delays = 75 


Used as a delay element and in conjunc 
tion with Z element for sampled data 
systems and difference equation 
computations 



e o =p i 


Used in conjunction with WYE element 
for implicit function generation 


e 0 = P i e i + P 2 e 2 +P 3 e 3 



\ e 0 | Logical branch element used in implicit 
function generation 


e 0 =P, if t = 0 and e 2 <0 


o o i 


e 2 > 0 


e 0 unchanged e 2 ^0 


V ±e, ±e 2 ±e 3 


This is the only element that accepts 
negative block numbers. 
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TABLE I — Definition of MOBSSL elements 


ELEMENT TYPE TYPE 
CODE 


DIVIDER 


MOBS SL BL0CK DIAGRAM 


SYMBOL 


SINE 

(RADIANS) 



COSINE 

(DEGREES) 


COSINE 

(RADIANS) 


TANGENT 

(DEGREES) 


DESCRIPTION S COMMENTS 


e o = e 2 * 0 

If e 2 =0, program interrupt occurs and 
360 supervisor generates message 
indicating exponent overflow exception 



e o 8 " e . 


e o = ( e ,) 2 


e, > 0 


If e,SO, problem processing is terminated 
and 360 supervisor generates error 
message indicating an attempt to take 
logarithm of a number ^ 0 has occurred 


V SIN ( P , e , +P 2 e 2 +P 3 e s ) 


Inputs in degrees 


e o ' ^ P i e i +P 2 e 2 +P 3 e 3^ 

Inputs in radians 


e 0 = c °S(P,e, 
Inputs in degrees 


e o = COS ( P,e, + P 2 e 2 +P 3 e 3 ^ 
Inputs in radians 
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TABLE I— Definition of MOBSSL elements 
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TABLE I — Definition of MOBSSL elements 


ELEMENT TYPE 

MOBSSL 

TYPE 

CODE 

j 

BLOCK DIAGRAM 
SYM BOL 

DESCRIPTION a COMMENTS 

ANALOG 
to DIGITAL 
CONVERTER 





e 0 " e ADC*P, ®ADC*Pj < ' ICO.OV 

Input at Beckman 2132 patchboard 

ADC *P, 

DIGITAL 
to ANALOG 
CONVERTER 

DA 

e . 

■ 

■ 

> 

e OAC*p, = e , e,< 100.0 

Output at Beckman 2132 patchboard 

DAC*P, output. Apply + 100.0V to DAC*P, 
hi and -100.0V to DAC*P, lo terminals. 

^VARIABLE 

CONSTANT 

VK 

e i Ly 

e 2 <§H 

3 Vy 

M 

0 


Enables user to vary constants from run to 
run. Output for run 1 is P ( . Output for 
subsequent runs is determined by output 
for previous runs and inputs at end of 
previous run (TF) and parameters P 2 and P 3 
e Q (run 1 ) = P ( 

e 0 [run(n + l)] =e Q [run(n)J + e,[run(n)] TF 

+ P 2 e 2 [run(n)] TF + P 3 e 3 [run(n)] TF 

MALE 

cf 


cf 


Used in conjunction with block to 

produce osculations. 

For additional details see page 7430 

FEMALE 

P 

e "H33 


Accepts only block as input. 

Produces more blocks. 

* 

SPECIAL 

ELEMENTS 

1 

2 

3 

4 

5 

■ 

m 


User supplied Fortran subroutines not 
restricted to 3 inputs, 3 parameters and 1 
output. Inputs and outputs of all blocks 
+ all MOBSSL variables (T,AT,TT0T, 
TSAMP, etc.) are available. Approximately 
1100 words of core available for this 
purpose. 

e— 1 - 5 

e 2 I r — 1 


Consider the modeling problem shown in Figure 1. 
The task is to select those values of the parameter 
vector, a , which result in minimizing the output of the 
Criterion Function Evaluator. The integral squared 
difference between the model output and the output of 
the unknown system is often selected as the scalar 
criterion function. The Gradient Processor, GP, block 
element controls the systematic variation of the param- 
eter vector, a, so as to locate the desired minima. 

Let the criterion function or cost function which is 


to be extremized be denoted as </>(« i 5 . . .,a n ) or just 
If <f> is a non-linear function of the parameter 
vector, a, as it usually is, then iterative search pro- 
cedures must be employed to find the extrema. Of the 
procedures described by Bekey and Karplus, 6 the most 
often used is the method of steepest ascent. The Gra- 
dient Processor, GP, block element mechanizes an 
iterative form of the method of steepest ascent de- 
scribed below. 

The gradient of the criterion function on the i th 
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GRADIENT PROCESSOR 


Figure 1 — Application of the gradient processor element 
to parameter identification 

iteration A<j>\ in the <£xaix,. . .,xa n space may be esti- 
mated by perturbing each parameter aj by an amount 
A a + j and Aar,*: 


+ Aa i + , cli 1 , • • • , otn 1 ) — 0(ai* — Aai , a 2 % • • • , aj) 1 




(Aai+ + Aar) 


<1 

II 

< P(otl S CV2% • * 

' *, <V + Aa n +) - (f>(a \\ a 2 \ ■ ■ 

■ • , — Aa n ~) 



(Aa n + + Aar) 



This computation requires 2n solutions of the 
equations which determine <j>, with appropriate cyclic 
control of the parameter vector a . At the conclusion 
of the 2n solutions, the gradient vector, Vtf>, is computed. 

Let the i th estimate of the parameter vector be 
denoted by a\ Let a 0 be any arbitrarily selected set 
of parameters, then the successive estimates of the a 
are computed from 

a i+i = a i + M i V<A i 

where M i is an n by n diagonal matrix of the form 
” m^ 

n^ 0 

nV 


0 

m n ^ 


The my are positive if a maximum of <£ is sought 
(steepest ascent) and they are negative if a minimum is 
sought (steepest descent). The magnitudes of the m ; - 
may bo used to restrict the size of each parameter step 
as follows: 

Let ||Aa i || w , the unnormalized parameter step, be 
defined as 


1 / * 

r d<t>(a„ *) -i ! 

IM1I* = lk +i - k z 

k J 

f tt-1 



Let MSL be a pre-defined constant, equal to the 
largest parameter step, Aa, desired. 

If 

| M| a > MSL 

then 


MSL-ky 



otherwise 

m / = k j 

where the ky are constants supplied as inputs to the 
GP blocks. 

As with all iterative procedures, it is difficult to 
determine when to stop the iteration. The Gradient 
Processor block element offers three separate stopping 
options, all controlled by input parameters: 

1. if the number of iterative cycles exceeds a 
specified number the simulation is terminated. 

2. if </> is being maximized and exceeds a given 
value, or <£ is being minimized and is less than a 
given value then the simulation is terminated. 

3. If |<K« m ) — is less than a given constant, 
the simulation is terminated. 

At the conclusion of each iterative cycle, a total of 
2n + 1 runs, the values of the new parameter vector, 
a i+1 the old criterion function, <£(<*0 and the new 
criterion function 4>(a m ) and the magnitude of the 
stopping criteria being used are printed. Additional 
listings of the gradient vector, and individual parameter 
changes both before and after normalization are 
optional. 

MOBSSL will accept up to 11 GP blocks. They 
must all be assigned sequential block numbers. The 





264 Fall Joint Computer Conference, 1969 



Figure 2 — -MGBSSL block diagram for second-order 
system damping ratio example 


first GP block is not associated with a parameter, but 
accepts inputs and constants used to control the se- 
quencing of the remaining GP blocks. The outputs of 
the last n GP blocks are the values of the n parameters 
ai 9 . . a n where n<10. A single parameter modeling 
problem is shown in Figure 2. 

The functions of the GP blocks inputs and param- 
eters are given in Table II . 

Parameter identification using the GP element 

The damping ratio, f, of a linear, second order 
system is not known. The response of this second order 
system to a step input is available. A model equation is 

X + 2o!iX + X = 1. 

The actual damping ratio, f , of the system was set to 
0.7. 

The MOBSSL program to carry out the iterative 
steepest descent minimization of the integral of the 
absolute value of the difference between the two step 
responses is shown in Figures 2 and 3. The initial value 
of the parameter a x is selected as 0.4. The MOBSSL 
results for the first iteration are shown in Figure 4. 
The first column of tabular data shown is time, the 
second is the output of the 2nd GP block, a h the 3rd 
column is the output of the Criterion Function Evalu- 
ator, which is <f>(a) at the end of a solution, the 4th 
column is the step response of the system containing 
the unknown parameter, and the last column and plot 
show the step response of the model containing par- 
ameter OL\. 

As can be seen from these results, a h started at 0.4 
and after one iteration had reached 0.6455, heading 
towards 0.7. 


TABLE II — Gradient processor inputs and parameters 


INPUT 2 


INPUT 3 
PARAMETER 


FIRST GP BLOCK 

INPUT 1 The Criterion Function, 0. 

A stopping criteria: Maximum or 
Minimum value of <j> desired:; usually 
supplied by a constant block. 

Not used. 

A Stopping Criteria: if |</>(<* i+1 ) — 
<f>(ot *)| < PAR 1, stop. 

If zero, using another criterion 
PARAMETER 2 If positive, maximize 0. 

If negative, minimize 0. 

Magnitude is largest allowable par- 
ameter step \\Aa\\. 

PARAMETER 3 A stopping criteria: number of 
allowable iterations. If positive, 
print optional information. If nega- 
tive, suppress it. 


ALL OTHER BLOCKS 

INPUT 1 Steepest ascent gain constant k ; - — 

usually supplied by constant block 
element. 

INPUT 2 j 

| Not used. 

INPUT 3 ) 

PARAMETER 1 Initial estimate of parameter value, 

Oij°. 

PARAMETER 2 Positive parameter perturbation, 
A aj + 

PARAMETER 3 Negative parameter perturbation, 
A a r- 


Iterative computational elements 

Iterative computational processes are facilitated by 
two MOBSSL elements: the STORAGE element and 
the VARIABLE CONSTANT element, designated S 
and VK respectively. These elements allow the results 
obtained in previous solutions to modify future solutions. 
When MOBSSL ends a solution, it examines the SIMU- 
LATION MODE established by the programmer’s 
control cards. If the RUN mode is in effect, and the 
Gradient Processor, GP, element is not in use, the 
STORAGE and VARIABLE CONSTANT elements 
are processed to determine their new outputs. All other 
elements are reset to their initial conditions and the 
independent variable is reset to zero. The solution 
counter, N, which begins at 1, is incremented by 1. 
When all of the bookkeeping is completed, MOBSSL 
begins the new solution. 
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Cl INF I CURAT I UN SPEC I F I C AT I UNS 


l IUT PUT NAME BLOCK 

number 

BLOCK TYPE 

INPUT I 

INPUT 2 

INPUT 3 

GP HEADER 


I 

GP 

9 

12 

0 

(,P PAR AM 1 


2 

GP 

1 l 

0 

0 

MODEL UF THING 


3 

I 

A 

0 

0 

moufl of thg out 


A 

I 

10 

3 

b 

PARAM MULTIPLIER 


b 

X 

A 

2 ‘ 

0 

i; RKUR 


6 

+ 

30 

3 

0 

ERROR** 2 


7 

X 

6 

6 

0 

SORT ( HRRUR*#2 ) 


H 

H 

7 

0 

0 

CRITERION FCN 


9 

I 

B 

0 

0 

INPUT 


10 

K 

0 

0 

0 

GR AIM ENT GAIN 


I l 

K 

0 

0 

0 

GAIN FUR MIN/MAX 


12 

K 

0 

0 

0 

THING 


30 

I 

AO 

0 

0 

THING DOT 


AO 

I 

10 

30 

AO 



INITIAL 

CONDITIONS and parameters 


IC/PAR NAME BLOCK 


IC/PAR1 

PAR 2 

PAR 3 



1 


0.00300 

-0.30000 - 

30.00000 



2 


0. A 0000 

0.20000 

0. 20000 



10 


l .00000 

0 . 0 

0.0 



I) 


-0 .06 000 

0.0 

0.0 



1 2 


0.0 

0.0 

0.0 



6 


0.0 

- 1 .00000 

-2.00000 



A() 


0.0 

- 1 .00000 

-1 .AO 000 




PROGRAM 

Ml IDE 

STOP 



INTEGRATION INTERVAL 

1 S 


0.04000 



TOTAL TIME IS 


1 0.00000 




PRINT INTERVAL IS 



0. AO 000 




PLUCKS TO BE PR INTEL! 

ARE 

2 

9 

30 


PLUCK TO BE PLOT T EO 

I S 


3 RANGE 

OF PLOTTED 

VARIABLE IS 



2 .00000 


Figure 3 — MOBSSL listing of configuration, parameter and timing data for damping ratio example 


The VARIABLE CONSTANT element 

The VARIABLE CONSTANT, VK, element is 
programmed in the same manner as the CONSTANT, 
K, element. In both cases, the elements output remains 
constant during a solution. The constant stored by the 
VK element is recomputed between successive so- 
lutions. Consequently, the VK element utilizes only the 
terminal values of its possibly time varying inputs. If 
information available interior to a solution is needed to 
modify the next solution, it must be stored in a sample 
and hold element until the end of the solution, at which 
time it may be used by a VK element . 

The VK element presents a constant output for an 
entire solution, equal to its output on the previous run 
plus the sum of its first input, P2 times its second input, 
and P3 times its third input all at the end of the previous 
solution. The constants PI, P2 and P3 are the block 
element's three parameters. Its output is set equal to 
PI during the first solution. The VK element is similar 
to a mechanical ratchet or to an accumulator. 

The VK element is equivalent to an analog computer 
iterative accumulator. It allows solution to solution 
parameter variations These may be systematic changes, 
random changes or solution dependent changes. For 
example a frequency iesponse can be implemented as 
shown in Figure 5. The VK block causes the radial 


frequency o) to be incremented by k at the end of 
each solution. 

Consider the following parameter identification 
problems. Let 

Yd + ot yz> = 1 

be a first order system containing an unknown param- 
eter a. Let 

y + oli y = 1 

be a model containing an adjustable parameter ai. 
Further, define an error measure 

e(t) = y D (t) — y(t) 

and a criterion function 



The derivative of the criterion function with respect to 
the parameter ai may be computed continuously during 
the solution. 
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COMPUTES 

(al 


COMPUTES 

<f> (a,+ aa*) 


T I6E 
NRIJN • 


PARAHETW 1 
BLOCH 2 


CMTtAJOH PCH 
BLOCK 9 


THI VO 

&LOCH 50 


MOPCL Of THI HO 
BLOCK 5 0.0 


NRIJN ■ 


O.AOOO 

0. Honn 
1 .2000 

1 . toon 
2.0000 

2. *000 
2.*nno 
J.2«*oo 

3.7.000 

a.pooo 

*.4.000 
* .nnno 
5.2000 
5.6000 


0.40000 

a. *ouou 
0 .*0000 
0 , *.00 no 
0 .••nnuO 
0. *0000 
0 .4.0000 
0.*0O0U 
0.*0000 
o. *ooou 
0.*0000 
o.*onoo 
0 .*0000 
0. *00*10 
0 .*0*100 


O N P A B • 

0.0 

0.00053 
0. on 70 3 
0.02893 
O.O'iSI 
0. 1*2** 
0.232*7 
0. 33485 
U.* J'/s* 
0. 53**3 
0.61 735 
ft. 67 707 
0.71 370 
0. 72*437 
ft, 7 }i I A 


0.0 

0.0*609 
0.21*75 
0 . 3 V 7* 9 
0,57350 
0.72571 
0.(1*577 
0.9 32*79 

0. 99092 
I .02509 
1.0* |58 
I .0*596 

1 . ft* 291 
I .01597 
1 .02766 


O 

0.0 

0.07H7 
(1.2*835 
0 . * 7856 
0.71619 
0.92708 
1.09007 
1 .19625 
1.2*683 
1 .25021 
1.21685 
1 .166*1 
I. 10601 
1 .0*77* 
ft, 99699 


6.0000 
6,*0O0 
6. "000 

7. 2000 

7.6000 
6. 00 00 
" . *'01(0 
8.»nnn 

9.2000 

9.6000 
10.0000 
I n. fi*oo 


O.*ooou 

0. *OO0<) 
O, *0000 
0.4000(1 
0. * 1*4100 
O,* 0004) 

O, *nn(10 
0. *11*100 
0. *01(00 
o .4.00011 


0. 751)7* 
0.77675 
0. 60*79 
0.6321? 

0. 8*5 1 9 
0.67278 
*1. 667 63 
0 . 6*4 77(4 
0,666*5 
0.69260 
0. 69661 
0.699*9 


I .01956 
1.01 757 
1 .00705 
1.00 3(1* 
1 .00038 

0. 9*166 ? 
0 . 9960 7 
0. 9*4 766 
0 , *1911(12 
0, 9*16 1* 
0.99673 
0.99676 


0.96372 
0.9*102 
0.9J566 
0,9390* 
0.9*972 
0.96*77 
0.97966 
0.993*6 
1 ,041*5* 
1.01190 
1.01563 
I .01567 




Nftl* ■ 

n. 0 

o. *ooo 

0.6000 

1 . 7000 
1.6000 
7,0090 

7.*ooo 

7. 6Of>0 

3.7000 
3.6000 
* .noon 


5. 7009 

5.6000 

6.0000 

6.6000 
6.6900 
T . 7000 
■7. 6000 
6 . 00*10 
6.6 000 
6 . 6000 
9. 7000 
9.60110 

IO.OO00 

10 . 0*00 


6000(1 

6000(1 

600*1(1 

60000 


n . 6 00**9 


MVA 6 ■ 

0.0 

0.004)1 1 
0.00771 
O. 0**6 66 
0.07700 
0 .O *|66 
0 ,06 7( 9 
O. OT»66 
0.176*1 
(1. 151 66 


6*1**110 

fcniHM) 





O. 600410 
O . 600011 


0. I 91-46 
O .70 1*(> 
0.71*6* 
0.71996 
O. 7706* 

4*. 777*1 
O. 77**1 
0.7*91* 
0.7 Ml? 

n. 7 4 97* 
0.7*17 7 
ft. :-*f 67 

n.7* 4 16 

ft. 7* 16 1 


0.0 

(1 .06609 
0.21675 
0 . 197*9 
0.57166 
0. 776 71 
O, 6* 6 77 
0.9 17*49 

0. 9**11*17 
1 .*'.*609 
1.061*6 
1 ,0*5*16 
1.0**791 
1 .0 369 7 
I . 07 766 
I .HI 9*6 

1. <M767 

I .Oil 706 

1 . no j*>* 


o. 9*1667 
O . 99 60 7 

n. 99007 

0.9961* 
O . 996 7 1 
(1 . 990 76 


0.0 

0.06772 
0.72652 
0.*7I TO 
0,61510 
O . 76799 

0. 9|*09 
1 .00590 
| , (16706 
| .06901 

1, I19*S| 
1 .4(6605 

1 .4)7(10 3 
I .0*11.1 I 

I. 04137 
I .01 1 99 
I .(10675 
0.99670 
0.993** 
ft • 99 1 3 1 
ft. 9*( I in 
0.99700 
0.99366 
0.996*0 
(1,99 7 1 3 

II. '1*16*7 
ft. 9*166* 



2.0000 


COMPUTES 

4 > (a.-aap 

FOLLOWED BY 
STEEPEST 
ASCENT 
PARAMETER 
„i ♦ I 

step a 


COMPUTES 

*#>( a 1 *') 


*46169 • 


ft . *000 
0.6000 
I . 7 nno 

I. 6IIIIO 

7 . 0000 

7.4.000 

?.6li«*0 

3. ?nno 
1.6001) 
6.000(1 
* .*o*i ft 
*.60043 
6. 2 0*30 

5.6000 

6.0000 
6. *OO0 
6. 60011 

7. 7000 

7 . 6000 

6.0000 

6.6000 
6.6000 

9.7000 
9.6000 

10.0000 

10.0*00 


ft . 7t|((04l 
O. 7000(1 
ft . 700(10 
*». /o'lft*) 
n.7n*mo 
O. 7((*(0O 

(I. 7*11.00 
0 . 70004! 
ft .71(0(1(1 
O, 700(1(1 


O. 700(1(1 
(I . 7 00011 



O. 7000(1 
O .7 000(1 


o .701(00 
O. 700111] 
0.701)0(1 
O. 7*1(1(10 
0.2000(1 
0. 7011110 
0.70000 
0.701)00 
0.20000 


7 (46*6 • 

ft .0 

ft. 0 04)97 
0,4117*6 
0.0*7 7 6 
0.11773 

O. 7 7 4 77 
O . * * * * 7 
O. 66 7*10 

0 .61(566 
1.067*1 
I .7*U05 
I . 1*401 

1 .60* 19 
1.67167 
I , * H *« 4 1 
I . *67* 1 
I ,69* 16 

1 . 408 58 
I .90376 
I.97U3 

7.005 1* 
7.0 1 32 1 

2 .0**()| 7 

7.06 770 
2. I ***7 
7.707 I 7 
7 . ?n 767 


o.O 

0 .06609 
0.71675 
0. 7 9 7*9 
O. S 7 754 
0. 775 7 | 
0.66*77 
0.9 (7 99 
0, 99(192 

1 • 0 7*0 9 
1.0*156 
I .0*896 
1 .'16791 
I .0 15 9 7 
1.07 766 
1 .01956 
I .1*1 75 7 
I .00705 
1.00 70* 
I ,000 311 
0.9988? 
0.9980 7 
0.99788 
0.991*0? 
0.9983* 
0.99873 
0.99876 


0.0 

0.0 7*90 
0.2 73 70 
0.5*906 
0.8*951 
I . 177*8 

1 . 365 7* 
I .481 10 
I .52658 
l .4HH35 
1 . 38*57 

I .7399* 

I .OH I 73 
0.93899 
0, HI 9** 
0.74731 
0. 777 77 
0.7*771 
0. 796*6 
0.8723* 

0.95565 
I .03310 
1.09477 
I . I 3764 
1,14598 
I . 1 36(36 
1 . I 3J95 


NRHN i 


0.4000 

0.8000 

1 . 7000 

I .600(1 

7.0000 

2.4000 
2.6000 

3.7000 

3.6000 

4.0000 
* . * noo 

4.6000 
5. 2000 

5.6000 

6.0000 

6.4000 
6.8000 

7.7000 
7.61)00 
6.0000 
6.4000 

6.6000 


0.6455(1 
0.64 554) 
0.64*50 
0 • 64 S *4J 
0.64550 
O. 64554) 
0 . 6*550 
0.645*0 
0.64551) 
0.64550 
0.6 4**0 

0.6*550 
0.6 '•**() 
0.6*550 
0.64*50 
n. 64*50 
0.64550 
0.645*0 
0.6*5*11 
0.64*50 
0.645*0 
0.64*50 

0.64S*() 


1 N PA 6 « 

0.0 

0.00009 
0.00119 
0. 1)04 74 
O.OM 77 
0.07771 
0.035*3 
0.0*049 
0 .065 74 
0.08005 
0 .09749 
O. | 0250 
0.10988 
0. 114 75 
0.11 7*2 
0. I |6 1* 
0.1 1HT2 
0. I | 986 
O.I2I4? 
O. 17309 
0 . 1 7464 
0.17596 
0.17*99 


0. 0 

0.06609 
0.21675 
O , 1974 9 
0.57356 
0.725 71 
0.8*577 
0.9 3799 
0. 9909? 
I .07509 
1.0*156 
I .04596 
1.0*791 
1 .0 3597 
1.0? 766 
1 .01956 
I .1*1 757 
I .00705 
1.0030* 
I .00036 
0. 9988? 
0.9980 7 
O. 99 7(4 6 


0.0 

0.06697 

0.27199 

0 . * 1 0* 3 
0.59556 
0. 75590 
0.66167 
0.97131 
1.07H3? 

1 .05865 
I .069 79 
t .0677? 
1,05817 
1 .04573 
1.03195 
1 .02009 
1.0 1 05 J 
1 .0035? 
0.99890 
O .99629 
0.99519 
0.9951? 
0.9956R 




9.2000 0.64554) 

9.6ftno 0.64550 

tn.onnn 0 . 6*590 

10.0*00 0.645*0 

SUMMARY '«> -, a 

OF PREVIOUS ou ”° 

2N + I SOLUTIONS r^» c "“ uw " , " cn ™ 


0.1777? 
0.12816 
0.1264? 
O . I ? 6* 3 

i'am u 


0.9980? 
0.9983* 
0.9987 3 
0.99876 

I 'AH 5 


ajKKLNT CMTUULM 7PNCTUW 
0.12U.3 


0.9965 5 I 

0.997*9 I 

0.9963 7 I 

0.99(1*5 I 

FAR 6 PAR 7 


STW’nNQ CRJ TLRION 

0.003 



PAR 9 


PAR 10 


Figure 4— MOBSSL solutions for first iteration of damping ratio example 
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Figure 5 — Frequency response implementation through 
use of the VK element to increment « 


= 2e(t) ^ 

dai dai 

where 

dy(t) 

dai 



Figure 6— MOBSSL block diagram demonstrating use of 
VK block as iterative element in parameter search 
by discrete sensitivity difference equation technic 


is the solution of an associated differential equation 
described by Meissinger: 7 



The parameter adjustment algorithm is 



and 

a i+1 = a { + A a 1 i = 1,2, • • • 

This process is easily mechanized by the VK element. 
The incremental changes in the parameter ai are 
accumulated from solution to solution. The MOBSSL 
program is seen in Figure 6. The computational results 
of this program, for an a of 1 .0 are seen in Figure 7. 
The four columns of tabular data are; time, y D (t), 
y(t), dy(t)/ dot i, respectively. The last column and the 
plot show the growth of the criterion function during 
the solution. The initial value of the adjustable param- 
eter, ai, is 0.1. It becomes —0.5342, —0.5928, and 
— 0.6365 in three successive solutions, converging 
towards —1.0. As can be seen from the plot, the error 
decreases considerably from solution to solution. 


The STORAGE element 

The STORAGE element is very similar to the VK 
element, except that its output remains constant for 
one or more solutions depending on the block’s param- 
eters. During the first solution, the output of the 
STORAGE blocks ares et equal to their first param- 
eter’s values. Thereafter they are set equal to the 
sum of their inputs. The change in output takes place 
every P3 solutions, when 

[(N-P2) MODULO P3] = 0 

For example, if P2 = 0 and P3 = 3, then the output of 
the storage block will be PI during solutions one, two 
and three and will be reset to the sum of its inputs 
between solutions three and four. It will hold this 
output during solutions four, five and six, recomputing 
its output between solutions six and seven, etc. Notice 
that the STORAGE element does not accumulate in 
computing its new output. If P2 = 0 and P3 = 1 the 
STORAGE element is a non-accumulating VK ele- 
ment. In Figure 8, the S element is used to pass the 
output of integration element 1 from one solution to 
the next. During the first solution the output of the 
S element is 3.0. A more complex application is shown 
in Figure 9 where the output of the S element is used 
to modify the structure of the simulation from solution 
to solution. In odd numbered solutions, N = 1,3,5, 
etc., the output of the S element will be +1 and the 
output of RELAY element 4 will be —1. As with all 
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NORSK. VAF-- MERRITT'S UWN BLOC* SIRUCTUREO 

SIMULA? ION 

LANGUAGE t UNPRONOUNCEABLE ACRUNVN FUR...MK 1 I MOD 2 JAN 01 1969 

CONFIGURATION SPECIFICATIONS 





OUTPUT NAME 

BLOCK NUMRFR 

block type 

INPUT 1 INPUT ? INPUT 3 

thing 

L 

I 

1 3 

1 

O 

MODEL CIF THING 

2 

I 

1 3 

3 



3 

X 

2 

12 

0 

influence cueff. 

6 

I 

0 

2 

5 


5 

X 

12 



ERROR 

6 

♦ 

1 

-2 

0 

ERROR SQUARED 

7 

X 

6 



CRITERION FUNCTN 

H 

9 

I 

X 

7 

6 

4 

0 

GAIN 

10 

G 


0 

0 

DELTA ALPHA 

1 l 

1 

10 

0 


variable constnt 

12 

VK 

1 1 



INPUT 

13 

¥ 

0 

0 



INITIAL CHnUITIUNS 

AND PARAMETERS 


IC/PAR NAME 

BLOCK |C/mA«I 

PAR? 

PAR 3 


ALPHA 

1 

0.0 

1.00000 0 

.0 



2 

0.0 

I.Of'OOO 0 

.0 



4 

0.0 

1 .00009 1 

.ooooo 



10 

-O. 02 500 

0.0 0 

.0 



1 2 

-o. loooo 

0.0 0 

.0 



1 3 

1.00000 

0.0 0 

.0 



PROGRAM mode run 



integration interval is 

o.oiooo 



TOTAL TINE IS 

4.20000 




print INTERVAL IS 

0.30000 




BLOCKS TO BE PRINTED ARE 

1 

2 



BLOCK TO BE PLOTTED IS 

B RANGE OF PLOTTEO VARIABLE IS 

0.0 10.00000 


THING LODEL OF 1111)10 INF 111 31 CE OOEFF 

CRITERION KCN 


TINE 

BLOCK 1 BLOCK 2 BLOCK 4 

BLOCK 8 

0.0 10.000 

0.0 

0.0 

0.0 

0.0 

0.0 

♦ 1 

0.3000 

0.25910 

0.295 54 

-0.0441 1 

0.00008 

♦ I 

0. 6000 

0.451 19 

0.58235 

-0. 1 7296 

0.00221 

♦ I 

0.9000 

0.59343 

0. 86060 

-0.3H150 

0.01425 

♦ I 

1.2000 

0.69000 

1.13078 

-0.66490 

0.051 19 

♦ 1 

1.5000 

0. 77687 

1. 39207 

-1 .01856 

0. 13400 

♦ • I 

I. 8000 

0.8 V«70 

1 .64722 

-1.43805 

0.28767 

1 ♦ I 

2.1000 

0.87754 

1.89406 

-1.91919 

0.53936 

I -♦ I 

2.4000 

0.90928 

2.13359 

-2.45797 

0.91691 

1 + I 

2.7000 

0.93279 

2. 36605 

-3.05057 

1 .44768 

1 — — — + I 

3.0000 

0.95021 

2.59164 

-3.69333 

2.15788 


3.3000 

0.9631 1 

12.81056 

-4.38276 

3.07207 


3.6000 

0.97267 

3.02301 

-5. 11556 

4.21287 


3.9000 

0,97975 

3.22918 

-5.88855 

5.60087 


4.2000 

0.90500 

3.42926 

-6.69872 

7.25461 


4.2100 

0.90515 

3.43503 

-6.72633 

7.31452 


THE VALUE OF VARIABLE CUNST ANT WITH BLOCK 

NO. 12 IS 


-0.5342 AT END OF RUN 1 

RUN TERMINATED 

AT TIME LOUAL 

TO TOTAL TIME 



0.0 

0.0 

0.0 

0.0 

0.0 

+ I 

0.3000 

0.25910 

0.27719 

-0.04047 

0.00002 

* I 

0.6000 

0.45119 

0.51 334 

-0.14579 

0.00051 

♦ 1 

0.9000 

0.59343 

0.71452 

-0.29587 

0.00307 

+ 1 

1.2000 

0.69000 

0.88591 

-0.47514 

0.01027 

♦ 1 

1.5000 

0. 77687 

1.03192 

-0.67167 

0.02505 

♦ I 

1.8000 

0.8 3470 

1 .15028 

-0.87641 

0.05014 

♦ I 

2.1000 

0.87754 

1.26224 

-1.08261 

0.00 772 

4 I 

2.4000 

0.90928 

1 .35250 

-1.28534 

0.13932 

+ 1 

2.7000 

0.93279 

1.42939 

-1.401 1 1 

0.20576 

1+ 1 

3.0000 

0.95021 

1 .49490 

-1.66756 

0.28727 

1 ♦ 1 

3.3000 

0.9631 l 

1.55071 

-1.84313 

0.38362 

I -♦ 1 

3.6000 

0.97267 

l .59826 

-2,00698 

0.49410 

I -+ 1 

3.9000 

0.97975 

1.63H76 

-2.15871 

0.6101 1 

| — + I 

4.2000 

0.98500 

1 .67327 

-2.29832 

0.75440 

1 ♦ s 

4.2100 

0.9B515 

1.67432 

-2.30277 

0.75915 

I + I 

THE VALUE OF VARIABLE CONSTANT WITH BLOCK 

NO. 12 IS 


-0.5928 AT END OF RUN 2 

RUN TERMINATED 

AT TIME fcUUAl 

to total time 



0.0 

0.0 

0.0 

0.0 

0.0 

+ I 

0.3000 

0.25918 

0.27484 

-0.04000 

0.00002 

+ 1 

0.6000 

0.45119 

0.50490 

-0. 14251 

0.00038 

♦ I 

0.9000 

0.59343 

0. 69740 

-0.2 860 9 

0.00228 

+ 1 

1.2000 

0.69880 

0.H5869 

-0.45464 

0.00757 

♦ I 

1.5000 

0. 7768 7 

0. 99363 

-0.6362 l 

0.01829 

4 1 

1.0000 

0.83470 

1 .10657 

-0. 02209 

0.03631 

4 t 

2. 1000 

0.87754 

1.20111 

-1 .00604 

0.06303 

4 I 

2.4000 

0.90928 

1 .28024 

-1.18375 

0.09934 

4 I 

2.7000 

0. 93279 

1. 34648 

-1 .35237 

0.14565 

+ 1 

3.0000 

0.95021 

1 .40193 

-1.51015 

0.20196 

1+ 1 

3.3000 

0.9631 1 

1.44835 

-1 .65617 

0.26793 

I + 1 

3.6000 

0.97267 

1 .48721 

- 1 . 79004 

0. 34302 

1 + I 

3.9000 

0.97975 

1.51973 

-1.91187 

0.42653 

!-♦ I 

4.2000 

0.98500 

! 1 .54696 

-2.02202 

0.51770 

1-+ 1 

4.2100 

0.9«515 

1,54/79 

-2.02549 

0.52086 

1 -♦ I 

THE VALUE OF VARIABLE CUNSTANt WITH BLOCK 

NI). 1? IS 


-0.6365 AT END OF RUN 3 

RUN TERMINATED 

AT TIME EQUAL 

tf) TOTAL TIME 



0.0 

0.0 

0.0 

0.0 

0.0 

♦ I 

0.3000 

0.25918 

0.27309 

-0.03966 

0 .00001 

4 I 

0.6000 

0.45119 

0.49H72 

-0.14011 

0.00030 

+ 1 

0. 9000 

0.59343 

0.68512 

-0.27903 

0.001 7 R 

♦ * 

Figure 7 — MOBSSL data and computational results from first three iterations of parameter search by sensitivity equation method 
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Figure 8 — Use of storage element to store and transfer 
integrator output between simulations during 
iterative operation 



Figure 9 — Storage element used to modify structure 
of simulation from one solution to next 


block structured languages, the sorting algorithms 
experience difficulty with purely algebraic loops. The 
STORAGE-RELAY element loop is rendered sortable 
by the inclusion of the UNIT DELAY element. The 
UNIT DELAY has no effect on the computations, 
and the output of RELAY element 3 will be fi (x). 
During even numbered solutions the output of the 
STORAGE element will be — 1 and the output of RE- 
LAY element 3 will be f 2 (x) . 

Hybrid computational elements 

MOBSSL, UAF has been developed for use in the 
USC System Simulation Laboratory. The System 
Simulation Laboratory’s computer complement is 
shown in Figure 10. The software and hardware link- 



BECKMAN 

2132 

ANALOG 

COMPUTER 


Figure 10— USC system simulation laboratory computer 
complement 


age between the computer graphics terminal and the 
IBM 360 is not yet installed. The analog computer 
is equipped with a multi-channel strip chart recorder, 
one and two pen x-y recorders as well as oscilloscopes 
and memoscopes. Software presently exists to allow 
the digital computer to carry out the following inter- 
face operations: 

a. digital to analog conversions 

b. analog to digital conversions 

c. read discrete data lines 

d. set discrete output lines 

e. control the mode of the analog computer 

f . operate the analog computer’s select system 

g. process external interrupt signals 

h. set potentiometers in the analog computer 

As yet, only the first two functions analog to digital 
and digital to analog conversions, with element desig- 
nations of AD and DA respectively, are available 
within MOBSSL. MOBSSL programs may contain up 
to 10 DA elements and up to 32 AD elements, limited 
only by the available hardware The hybrid elements 
may be used separately as I/O elements or together 
as part of a closed loop hybrid operation. 

The DA element is often used as an output element 
in MOBSSL simulations. As shown in Table I, the 
DA element causes a voltage, equal to its first input, to 
appear at the output of the digital to analog converter 
selected by its first parameter. If the input exceeds 
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±100.0, the output voltage will not be correct. DA 
elements may be used to drive recording devices in 
order to obtain graphical presentations of MOBSSL 
results. Because of large variations in computation 
times and input-output times, unless special timing 
routines are used, the amount of real time between 
successive outputs will not be constant during a solu- 
tion. There are several ways of getting around this. 

1 . Use the graphical results qualitatively and obtain 
quantitative results from the printer listing. 

2. If a multichannel strip chart recorder is used, 
place a known function of time on one channel 
and derive timing information from it. The in- 
dependent variable, sine waves, output of 
timing elements, etc., are convenient signals. 

3. If an x-y plotter is used place the independent 
variable, the output of block 201, on one axis. 

4. Two dependent variables are being plotted 
against each other and no timing information 
is required. 

Methods 3 and 4 are used in the example described 
below. 

The AD element type is useful for changing param- 
eters and initial conditions. As shown in Table I, the 
input to an AD block is supplied by an ADC located 
on the analog computer patchboard. Parameter 1 of 
the AD block determines the ADC number. The AD 
block output is a floating point number between 
±100.0. If the input exceeds ±100.0 volts, the output 
of the AD block will be incorrect. On line parameter 
changes can be achieved by connecting the outputs of 
manually operated potentiometers to the input of an 
A-D converter as shown in Figure 11a. Figure lib 
demonstrates the use of the AD block to permit on- 
line adjustment of constants and coefficients appearing 
in MOBSSL block diagrams. Figure 11c demonstrates 
the use of the AD block to allow on-line changes in 
integrator initial conditions. This is valid since the 
output of an integrator is: 


e 0 (t) = e 0 (0) + J e in (t) dt 

o 

and e 0 (0) can be any number summed with the output 
of an integrator having zero as its “initial condition.” 

When MOBSSL is being used in an iterative mode, 
on-line adjustments are needed only at the beginning 
of a solution. Parameter variations during the solution 
are undesirable. This may be achieved by using the 


+ 100 


© 


manually operated 
three terminal pot 


(a) <> Analog patchboard 

hookup 


- 100 - Y — +100 



MOBSSL configuration for variable constant and variable gain 



MOBSSL configuration for integrator 1C 


Figure 11 — Use of AD element for on-line parameter 
changes 



Figure 12 — AD block used to modify a parameter at the 
beginning of a run 


zero order hold as a sample and store element. When 
input number 2 to the ZOH element is less than or 
equal to zero it holds its previous output. When it is 
positive it samples, stores and holds present input. In 
the example shown in Figure 12, the ADC is effective 
only during the first second of the solution, after which 
it may be pre-set in preparation for the next solution. 

An AD element can be used as the input to a QUIT 
block to terminate a run from the analog console. 

Other applications of the AD element include sam- 
pling and processing of analog data where synchronous 
sampling is not required. The output of the gaussian 
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noise generator, both direct and filtered, located in 
the Beckman Analog Computer, may be sampled and 
used in place of the output of the uniform distribution 
Random Number Generator block type. 

Attempts to use the hybrid block elements in real 
time applications have brought to light the need for a 
whole series of timing and interrupt processing ele- 
ments. These elements will expand the real time capa- 
bility of MOBSSL considerably. 

The following example often referred to as the Host- 
Parasite problem, demonstrates the use of the DA 
block to drive an X-Y plotter. It is a set of differential 
equations which represents the population of hosts and 
parasites as a function of time. The physical situation 
from which the differential equations are abstracted 
comes about when there is a host (i.e., food for a 
parasite) which would reproduce at a known rate if 
there were no parasites. The parasites die off at a known 
rate if there are no hosts. Finally, a decrease in the 
number of hosts and an increase in the number of 
parasites is a function of the number “encounters” 
between hosts and parasites. Whenever a host is un- 
lucky enough to encounter a parasite, the parasite 
eats him up. The equations implemented are: 

H = KiH - K 4 HP 
P = K 2 H + K 3 HP 

H « host population as a function of time 

P t parasite population as a function of time 

Ki t overall growth rate of hosts per hour 
assumirg no parasites 

K 2 t overall decay rate of parasites per hour 
assuming no hosts 

K 3 , K 4 1 number of host-parasite encounters per 
hour 

t £ time in hours 

where 

Ki = 0.05/hour (+5% per hour) 

K 2 « 0.10/hour (—10% per hour) 

K 3 £ 2 X 10“ 4 /host-hour oneen counter 

per 5000 hours 

i 

K 4 * 2 X 10“ 4 /parasite-hour for every host- 

V parasite pair 



Figure 13 — MOBSSL block diagram for the host-parasite 
problem 


Initial Conditions: 

Run I Run II Run III Run IV 

H(0) 100 1200 600 500 

P(0) 200 1200 500 250 

The MOBSSL diagram is shown in Figure 13, a listing 
of the MOBSSL configuration specifications, param- 
eters and other simulation data are shown in Figure 
14. Figure 15 is a graph of hosts vs. time and parasites 
vs. time obtained using the PCHG mode and inter- 
changing parameter 1 of blocks 6 and 8 on the second 
run. Time is obtained from DA block 9 appropriately 
scaled by gain block 4 from block 201 which provides 
the independent variable. Figure 16 is a phase plane 
plot of hosts vs. parasites for four sets of IC’s. DA 6 
provides the input for the plotter’s X axis and DA 8 
drives the plotter’s Y axis. Note that the existence of 
closed orbits for all physically realizable IC’s is clearly 
demonstrated, as well as the existence of a stationary 
point at (H,P) = (500,250). 

Disk input and disk output elements 

Through the use of the Disk Input, DI and Disk 
Output, DO, blocks vector functions of the inde- 
pendent variable may be respectively read out of and 
written into previously alloc \ ted data sets on disk 
storage during a simulation. The DI block is used when 
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HOS5SLtlMF— MERRITT *S UWN RU>CK STRUCTURED SIMULATION LANGUAGE. UNPRONOUNCEABLE ACRONYM FOH...MK 11 MOO 2 JAN 01 1969 


CONFIGURATION SPEC l F | C A 1 I UNS 


OUTPUT NAME BLOCK 

NUMBER 

BLOCK TYPE 

INPUT 1 

INPUT 2 

INPUT 

MULT I PL IER 


1 

X 

2b 

27 

0 

HOSTS 


2* 

1 

0 

l 

2b 

PARASITES 


27 

1 

0 

1 

27 

HOST SCALING 


26 

G 

2b 

0 

0 

PARAS 1 T t SCALING 


28 

G 

27 

0 

0 

host oac 


6 

U A 

26 

0 

0 

PARASITE OAC 


8 

DA 

28 

0 

0 

time scaling 


4 

0 

30 1 

0 

0 

TIME OAC 


4 

UA 

4 

0 

0 



INI 1 

IAL CONDITIONS AND PARAMETERS 


IC/ PAR NAME HLUCK 


1 C / PAR l 

PAR? 

PAR 3 


PARASITE IC 

27 


200.00000 

0.00020 

-0. 10000 


HOST IC 

2 5 


LOO. 00000 

-0.00020 

0.05000 


host scaling 

26 


0 .06000 

0.0 

0.0 


parasite scaling 

?h 


0.06000 

0.0 

0.0 


HOST OAC NUMBER 

6 


2 .00 000 

0.0 

0.0 


parasite oac NO. 

H 


1 . 00000 

0.0 

0.0 


time oac number 

4 


4.00000 

0.0 

0.0 


time scaling 

4 


0.2000O 

0.0 

0.0 




PROGRAM MOUfc 

STUP 



integration interval 

1 S 


0.10000 



total TIME is 


31 8 

.00000 




PRINT INTERVAL IS 



6.00000 




BLOCKS TO BE PRINTED 

ARE 

4 

1 

27 


BLOCK TO BE PLOTTED 

1 $ 


2b RANGE 

OF PLOTTtO 

VARIABLE IS 



1640.00000 


TIME 

T SCALING 

BLOCK 4 

MULTIPLIER 

8L0CK l 

PARASITES 

BLOCK 27 

HOSTS 

block 25 

0.0 

0.0 

20000.00000 

200.00000 

ino.uoooo 

6.0000 

1.20000 

1 3882 .8406? 

124.481 14 

1 1 1.52606 

12.0000 

2.40000 

10557. 38b 7? 

79,04*.?3 

133.54616 

18.0000 

3.60000 

8659 .5 703) 

51.84J28 

166.8791 7 

24.0000 

4.80000 

764 3. 46094 

34.7 12 8 9 

213.90549 

30.0000 

6,00000 

7319.06250 

26. 29605 

278. 33325 

36.0000 

7. 20000 

7735.9442? 

21 .1 7857 

365,27246 

42.0000 

8.40000 

9/67.69422 

14,25253 

48 1 . 37573 

48.0000 

9. 59444 

1 3048, 33544 

20,55446 

614.6718/ 

54.0000 

10. 79444 

2 2565. til 562 

2 7.08 324 

833. 1 7285 

60.0000 

l 1.44999 

50248.58203 

46.60127 

10/8.26636 

66.0000 

13.1 9999 

144855 .68 750 

109. 20946 

1 3 15. 44957 

72.0000 

14. 39444 

4632 70. 1 8750 

325. 18359 

1424.64209 

78.0000 

15.59994 

744153. 37500 

804. 10059 

990. 1 1670 

84.0000 

16.74449 

426875. *12 50 

10O1 .42114 

426.26943 

90.0000 

1 7.44498 

150426.8 7500 

77 /. 02710 

194.2 36 16 

96.0000 

19. 19498 

62226. 98828 

51 1 .20435 

121 . 72626 

102.0000 

20.34498 

321.5 7.91 016 

319.64697 

'1 00. 604 46 

108.0000 

21.59448 

19788.8906? 

1 9 7. 5 900 3 

100. 1 5 12H 

114.0000 

22.79449 

1 3772 .43740 

123.02 144 

l 1 1 .94923 

120.0000 

23.94948 

104 94. 70/0 3 

7 8.180 /6 

L 34.2 3 65 0 

126.0000 

25.1 9998 

8623 .89453 

51 . 36963 

167, m 792 6 

131.9999 

26. 39498 

7626,4? / 14 

35 .4/40 6 

214.29242 

137.9999 

2 7.54948 

7314.1 8369 

26. 11466 

280.21 753 

143.9999 

28. 74997 

7767.67578 

21.09200 

367.80200 

149.9999 

29.9994H 

9326.94141 

1 9. 24 1 23 

484. 73755 

lbb. 9999 

31 . 19998 

13189. 30859 

20.63794 

639.0808 1 

161.9999 

32.39948 

2 294 7 . 1 2 89 1 

27. 35725 

HIM. 795 1 7 

167.9999 

33.54498 

51449.55469 

47.42339 

1084 .89844 

173.9999 

36. 7 999 7 

150217.43/50 

1 12.00 1 '*4 

1 34 1. 20386 

179.9949 

35.44997 

4 74 206, 3 7500 

334.35937 

142 1 .29438 

185.9999 

37. 1 9448 

794*02 .37400 

«15. 88574 

9 7 3.44614 

191.9999 

38. 39448 

416 141, 50000 

999.07373 

416.72754 

197.9999 

39.54998 

147190.40000 

769, 8648 7 

191.1888/ 

203.9999 

60. 79997 

61041. 1 5234 

504.402 10 

120. / 7 74 0 

209.9999 

61.99447 

31 706. HO MS 4 

3 14.81982 

1 00. 39425 

215.9999 

63. 14947 

14582. 5 /031 

195.22143 

100 .30949 

22 1.9999 

66 .39948 

l 3664 .20 312 

1 2 l . 59250 

112.37703 

227.9999 

65.59498 

104 13,24609 

77.32295 

134 .93082 

233.9999 

46. 79497 

8588 .98417 

50. 85753 

168.8HJ30 

239. 9999 

47.94997 

7610. 15625 

15.12104 

216.68373 

245.9999 

69. 1 9997 

7314. H<« 3 /5 

2 5. ,j 4 704 

282. 10718 

25 1 . 9999 

50. 39498 

7780.0156 2 

2 1 .00793 

3 70 . 13 /16 

257.999R 

51 .59995 

93 H 7 .41 806 

1 9. 23228 

488.10718 

263.9998 

52. 79994 

1 3336. 05469 

20.72440 

64 3 .4956 1 

269.9998 

53.99994 

2 3337.5 /812 

2 /. 6 3 748 

844.4] 772 

275.9998 

55. 19944 

52681 . 761 72 

48.24421 

1091 .50586 

281.9998 

56.39993 

154698.06250 

1 14,86 OHO 

1 346. 72583 

287.9998 

57.59995 

487144.18/50 

343.70508 

1417.47/54 

293.9998 

58.79994 

791 793.2SUO0 

827. 36304 

957.00830 

299.9998 

59.99994 

406039.06250 

996.44972 

407 .48 169 

305.9998 

61 . 1 9994 

14 3584 .06250 

762. 74780 

188.245R5 

311.9998 

62.39993 

59H92. 84766 

499.68262 

119.86179 

317.9998 

63.59995 

31267.78516 

312.05664 

100,19908 

318.0999 

63. 61 496 

30981 , 00000 

309.57080 

100.07727 

RUN TFRMINATEO AT TIME tOUAL TO TOTAL 
STOP 0 

ft END OF JOB 

TIME 



0.0 1640.0 




Figure 14— MOBSSL printer listing for the host-parasite problem 
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Figure 15 — X-Y plotter graph of hosts and parasites vs 
time 



Figure 16 — Phase plane plot of hosts vs parasites for 
four sets of initial conditions 




up to 
10 DO 
blocks 


Figure 17— Disk input and disk output block configurations 


a data set stored on a disk serves as an input to the 
MOBSSL simulation. These data could be stored at 
any prior time including the immediately preceding 
simulation during the present job if MOBSSL is in an 
interative simulation mode. The DO block is used 
when it is desired to write block outputs into a data 
set stored on a disk . The user therefore has the neces- 
sary tools to 

1. Provide complex previously obtained vector 

valued inputs to a MOBSSL simulation. 

2. Store vector valued MOBSSL time histories for 
future use. 

3. Perform various functional optimization tech- 
niques such as quasilinearization in which the 
previous time history serves as data for the 
present solution. 

Up to 10 DI blocks connected as shown in Figure 
17a are permitted. DI block numbers must be sequential 
and ordered to correspond to stored disk data sequence, 
i.e., the lowest block number corresponds to the first 


block output stored in a record, next to lowest block 
number to the second block output stored in a record, 
etc. 

Similarly, up to 10 DO blocks connected as shown in 
Figure 17b are permitted. DO block numbers must be 
sequential and ordered to correspond to desired disk 
data sequence i.e., the output of the DO with the 
lowest block number corresponds to the first variable 
stored in each disk data set record, etc. 

Disk read and write time intervals are independently 
determined by two user supplied entries on a Sample 
Time card which also includes line printer time interval. 
DI block outputs remain the same until the number of 
MOBSSL complete integration cycles since the most 
recent disk read multiplied by the integration inerval 
equals the disk read time. Similarly, writing onto the 
disk occurs only when the accumulated MOBSSL time 
interval since the most recent disk write is equal to 
the disk write time. 













274 Fall Joint Computer Conference, 1969 


Graphic MOBSSL 

The man-simulation language interface can be 
improved considerably through the application of 
computer graphics terminals. These devices allow 
block diagrams and equations to be manipulated on 
the screen of a cathode ray tube using light pens, 
tablets and alpha-numeric keyboards. When the 
problem definition is completed, the simulation 
language is called to generate the desired solutions. 
As the solutions are computed, they are displayed by 
the computer graphics terminal. 

Two graphics programs are presently under develop- 
ment at USC. The first, a graphic block diagram editing 
program, allows the user to construct MOBSSL dia- 
grams on the screen of the graphics terminal. The 
second, a differential equation editing program, allows 
systems of equations to be drawn on the screen. Prior 
to execution of MOBSSL, the equations are translated 
into a MOBSSL block diagram. The user may view 
the resultant diagram or immediately enter the 
MOBSSL program. Subsequent editing operations 
may be carried out on either the equations or the 
corresponding block diagram. 

These two problem preparation programs provide 
users with an extremely flexible communication inter- 
face. The speed with which large amounts of instruc- 
tional and reference material may be displayed makes 
it possible to operate these programs with almost no 
prior instruction. 

Future plans for MOBSSL 

Future developments of MOBSSL will be directed 
toward improved man -computer communications. 


These will take the form of additional process oriented 
block elements; additional parameter and functional 
optimization and identification procedures, timing and 
interrupt processing elements, and expanded graphics 
facilities. 
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A hybird computer programming 
system 


by M. A. FRANKLIN and J. C. STRAUSS 

Carnegie-Mellon University 
Pittsburgh, Pennsylvania 


INTRODUCTION 

In order to analyze and subsequently synthesize 
complex systems, engineers have increasingly turned to 
computer simulation techniques. Until recently, simu- 
lation techniques could generally be divided either 
on the basis of the type of computer to be used, or 
the type of system to be simulated. 

Of the two types of computers available, the analog 
computer was usually restricted to simulation of con- 
tinuous systems (i.e., systems described by sets of 
differential equations), while the digital computer was 
used primarily in simulation of discrete event prob- 
abilistic systems. As numerical integration techniques 
improved and digital computer speed increased, how- 
ever, it became clear that it was possible to solve sets 
of differential equations and thus simulate continuous 
systems on the digital computer. Many digital com- 
puter programming systems have been designed for 
use in such simulations. 

Associated with digital computer simulation there 
thus arose two types of simulation languages; one for 
describing discrete event probabilistic systems , 1 ’ 2 and 
one for describing continuous systems . 3 ’ 4 The main 
concern of this paper is with continuous systems 
simulation. 

With two types of computers available for simulating 
continuous systems the question of which type is 
“best” presents itself. The answer is often unclear 
and is certainly dependent on the system to be simu- 
lated and the definition of the word “best ”. 5 In general 
the digital computer has the advantages of high accu- 
racy combined with low problem setup costs and the 
availability of extensive digital computer logic and 


automatic programming facilities. It has the disad- 
vantages of limited problem solution speed (dynamic 
response) and limited man/machine interaction capa- 
bility. The analog computer, on the other hand, has 
the advantages of high dynamic response and good 
man/machine interaction with the disadvantages of 
limited accuracy and negligible automatic problem 
setup and programming facilities. 

More recently with the advent of the hybrid com- 
puter, an attempt has been made to combine in one 
computer system the advantages of both the analog 
and digital computers. Unfortunately, in addition to 
the advantages, some of the disadvantages have been 
combined and some entirely new problems have arisen. 
At this point, however, it appears that with the added 
facilities available in a hybrid computer system these 
problems can be overcome, and an overall improve- 
ment in continuous system simulation capabilities 
achieved. 

One of the main problems encountered with hybrid 
computers is that of effective system utilization. The 
process of effecting hybrid simulation studies can be 
described in the four phases presented in Table I. 
It has been found, from experience, that often the man 
hours spent in off line problem preparation and on 
line problem setup and debugging, exceed the man 
hours spent in determining solution methods and 
actually performing the simulation. In addition, it is 
often the case that more computer time is spent during 
the setup and debugging process than is spent during 
problem execution. These disadvantages derive mainly 
from the analog computer part of the programming 
and have, in general, carried over to the hybrid en- 
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TABLE I — Phases in preparing a problem for 
hybrid computer solution 

1. Defining the Problem and Determining the 

Solution Method 

a. Equations which govern the system 

b. Parameters to be varied in the simulation 
study 

c. General solution procedure 

2. Off Line Problem Preparation 

a. Allocating the problem to analog or digital 
parts of hybrid 

b. Scaling the equations 

c. Determining the static check values 

d. Allocating components to the analog patch- 
board 

e. Allocating interface linkages 

f. Developing the necessary digital programs 

3. On Line Preparation and Debugging of Problem 

a. Wiring the patchboard 

b. Loading the digital computer programs 

c. Static checking of analog and interface 
components 

d. Dynamic checking of analog and interface 
components 

4. Performing the Simulation Study 

a. Run time interaction; changing parameters, 
termination problem, etc. 

b. Run time diagnostics 

c. Run time documentation 

vironment. In part the reason for this is that, unlike 
the all digital computer system, the hybrid computer 
system lacks a coherent overall programming frame- 
work in which to operate. 

Several programming systems have been developed 
in the past to help automate the tasks of problem 
preparation, setup and debugging. The most promi- 
nent among these are the APACHE 7 - 8 9 and HYTRAN 10 
systems. Both of these systems have had their draw- 
backs. The APACHE system has had limited opera- 
tional success in this country 11 * 12 due to a combination 
of poor documentation and a lack of clearly defined 
program modularity. More-ffundamental criticisms of 
APACHE relate to its inability to handle either the 
extensive parallel logic capabilities available on many 
current analog computers or the variety of interface 
elements available on hybrid computers. In addition, 
the lack of a macro-programming capability, the 
difficulties in extending APACHE for use on machines 
other than the IBM 7090 and PACE 231-R, and the 
general lack of a suitable overall hybrid programming 


framework appear to make modification of the 
APACHE system for current hybrid computers an 
uneconomical, if not impossible, venture. The HY - 
TRAN system, which performs a subset of those tasks 
handled by APACHE, has also had limited success. 
In this case the reason appears to be that, for most 
users, the time required to prepare the large quantity 
of input information demanded by the system did not 
justify the system’s return. 

In this paper a hybrid programming system is 
proposed which attempts to avoid the drawbacks of 
the APACHE and HYTRAN systems while creating 
an overall programming framework in which to view 
hybrid computer operations. The system is presented 
in terms of four distinct language levels and the pro- 
cessing between these levels. The language levels range 
from a highest level machine independent “source 
language” for general problem representation, to a 
“machine language” which represents the actual hy- 
brid computer implementation. The processing between 
the language levels represents clearly defined tasks 
which can be modularly implemented in successive 
stages. At present, the two lowest levels, the “assembly 
language” and the “machine language” have been 
completely specified and the processing between them 
has been implemented for an EAI 680 6 Analog/Logic 
Computer. Some of the specification details are pre- 
sented and a programming example involving the simu- 
lation of an automobile suspension system is provided. 

Characteristics and concepts of hybrid 

programming systems 

This section considers characteristics which an effec- 
tive hybrid programming system should possess and 
some concepts which aid in effecting such a system. 

1. Language Levels : One of the keys to the successful 
development of effective all digital computer systems 
has been the familiar concept of language levels. 
Typically the lowest level, a machine language, is 
present for describing the most basic operations which 
can be performed on the computer in terms of the 
actual performing computer hardware. The next higher 
level language, an assembly language, deals with these 
basic operations, using mnemonics and symbolic no- 
tation. At a still higher level a source language consists 
of symbolic operations which represent combinations 
of basic operations. In the few existing analog oriented 
systems this approach has been avoided and there 
has been an emphasis on higher level language develop- 
ment. This has been done without developing clear 
machine and assembly languages for the representation 
of analog operations. 
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Part of the reason for this is that machine language 
representations of digital problems have direct oper- 
ational meaning in terms of sequencing and logical 
decoding of digital instructions, while such represen- 
tations for an analog computer, at least at present, 
have only symbolic significance. Symbolic represen- 
tational ability is, however, important both from a 
conceptual and practical point of view. In particular 
one of the principal criticisms of APACHE was its 
inability to respond to the analog programmers ability 
to think up new and sometimes strange circuit patching. 

A language level approach to a hybrid programming 
system which provides the ability on the lowest level 
to represent all legal analog computer patching thus 
seems desirable. 

2. Language Level Mixing : Provision should be made 
for intermixing language levels. While writing in the 
source language it should be possible to mix in assembly 
or machine language statements, thus enabling the 
specification of particular circuit patching for special 
situations. 

3. Macro Capability : At each language level, except 
the lowest, a macro generation capability should be 
present. At the source language level this capability 
would provide for defining new operators and algebraic 
or logical subroutines. At the assembly language level 
this capability could be used to define often used 
instruction sequences in the usual digital computer 
programming manner, or often used component 
groupings on the analog computer. Thus for example 
the assembly language instructions which correspond 
to an amplifier, resistance network, pot and reference 


voltage might be combined in a macro to define an 
Integrate operator. 

4. Interactive Mode : In order to maintain and enhance 
the interactive features naturally available on the 
analog part of the hybrid a run-time interactive mode 
should be present. Table II lists some of the functions 
which this mode should possess. Ideally run-time 
interaction should be permitted at each of the language 
levels. Although the interactive mode is not detailed 
in this paper, it appears to be a natural extension of 
the proposed programming system. 

5. Multiprogramming Capabilities'. At this time, no 
attempt is made to specify a general multiprogramming 
capability. A limited ability to assign problems to 
nonoverlapping sections of the analog patchboard is, 
however, desirable given the cost of such boards and 
the time required for patching. Allocation of com- 
ponents in this manner allows several small problems 
to remain wired on the patchboard at the same time 
while running only one of them in the hybrid con- 
figuration. 

6. Diagnostics'. One of the most useful features of 
digital programming systems has been the availability 
of extensive programmed diagnostics. These diagnos- 
tics aid the programmer in both compile time and run 
time program debugging. Similar capabilities must be 
present in hybrid programming systems if an effective 
system is to be developed. 

In hybrid systems it is convenient to divide diagnos- 
tic functions into four types: syntax, structure, set up 
and dynamic. The first two types correspond to com- 
pile time diagnostics and the last two types correspond 


TABLE II — Interactive mode functions 


ANALOG 

1. Change parameter values 

a. pot settings 

b. switch settings 

2. Scale problem 

3. Obtain static check values 

4. Perform static check 

5. Control computer status 

a. mode 

b. time scale 

6. Change program structure 
a. patching change 

7. Read out analog computer state 

a. Amplifier outputs 

b. Pot Settings 


DIGITAL 

1. Change parameter values related to 
utility routines 

a. integration step size 

b. function generation parameters 

2. Change program defined parameters 

3. Change program structure 

4. Initiate and terminate run 

5. Read out digital computer status 
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to run time diagnostics. Table III indicates some of 
the diagnostic capabilities associated with each type. 
The table is meant to be exemplitive rather than all 
inclusive and primarily defines a framework for di- 
agnostic functions in hybrid systerfis. 

It may be noted that there 1 is nothing directly 
comparable to setup diagnostics iti all-digital program- 
ming systems. This results front the assumption in 
all-digital systems that once the problem is represented 
correctly at any language level* a correct machine 
implementation will result. This assumption would also 
be correct for analog computers if! an automatic patch- 
ing device were available. 13 

7. Documentation: As in digital systems, documen- 
tation in hybrid systems can be divided into program 
and execution documentation. Program documentation 
includes general program listings, and for the analog 
part, wiring lists, pot settings, etc. Execution docu- 
mentation is associated primarily with the interactive 
mode of the system. Thus, changing pot settings, run 
modes and the like are automatically documented when 
the change is requested. Structure changes such as 
adding an amplifier onto the patched problem should 


also be documented. These changes, however, will 
require separate entries into the digital computer 
representation of the analog problem in addition to 
performing the changes on the analog patchboard. 
This is necessary if a current picture of the analog 
patched problem is to be maintained in the digital 
computer. In addition, general run-time documen- 
tation of the digital programs currently being used, of 
amplifier outputs and pot settings, and general analog 
computer structure lists, should also be present. 

8. Special Analog Computer Reguirements: Certain 
special compiler processing is required by the presence 
of the analog computer. Most of these special require- 
ments were implemented on the APACHE system. 
They are listed briefly below: 

a. Automatic Scaling: Automation of magnitude 
and time scaling is desirable. In addition, hybrid 
procedures for dynamic rescaling of hybrid problems 
to insure maximum accuracy should be provided. 

b. Component Allocation: Components should be 
allocated automatically to the analog patchboard. The 
facility for assigning particular components or re- 


TABLE III — Hybrid diagnostics 
ANALOG DIGITAL 


Syntax 

1. Syntax Checking 
Structure 

1. Connection symbol defined as in- 
put without being defined as 
output 

2. Not enough components are avail- 
able to implement program 

3. An illegal configuration is 
encountered 

a. Logical signal feeding analog 
component 

b. An output connected to an 
output 

c. Reference voltage feeding 
function relay 

d. Pot feeding function generator 


1. Syntax Checking 

1. Program branches to nonexistent 
statement number 

2. Program accesses nonexistent memory 
or subroutine 


Set up 

1. Static Checking 


Dynamic 

1. Overloaded amplifier 


1. Register overflow 
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stricting the assignment of components to a particular 
part of the board, should be available. 

c. Static Checking: An automatic static check 
procedure (setup diagnostic) is necessary which gener- 
ates static check values and subsequently performs- an 
on line consistency check between these values and 
those obtained from the patched problem. 

d. Digital Simulation: Facility must be provided 
for digitally simulating parts of the problem which 
may eventually be implemented on the analog part of 
the hybrid. This is useful both in determining variable 
ranges for scaling purposes and in deciding which part 
of a problem is better implemented on the analog or 
digital subsection of the hybrid. 

9. Special Hybrid Requirements : 

a. Computer Allocation: Except for special cases, 
digital computer programming at the source language 
level does not usually involve the problem of computer 
resource allocation. For the analog computer pro- 
grammer the allocation problem is confined to com- 
ponent allocation on the analog patchboard and is 
usually performed after the method of solution has 
been determined. For the hybrid computer program- 
mer the question of allocation of computer resources 
can be viewed as one of the first questions to be 
answered. Very eariy in the problem formulation it must 
be determined which part of the problem is to be imple- 
mented on the analog subsection and which part on 
the digital subsection. This decision affects both the 
speed and accuracy of problem solution. 

Assuming for the moment that criteria can be 
developed for determining the analog/digital imple- 
mentation division of a problem, it is useful to consider 
two types of source languages. The first is for general 
problem representation independent of implementation, 
and the second indicates the implementation division 
between the analog and digital subsections. 

Formulation of general computer allocation pro- 
cedures are extremely difficult. It appears that in many 
instances the mathematical formulation of the problem 
contains certain prejudgments concerning the allo- 
cation problem. The area requires a good deal more 
study and is not considered further in this paper. 

10. System Implementation : Experience with APA- 
CHE and other systems suggests that it is often 
desirable to sacrifice something in run-time program 
efficiency for the clarity gained by programming in a 
higher level language. The resulting program usually 
is more easily understood, and thus adaptable, by 
persons not intimately acquainted with the system. For 


the same reason system implementation should empha- 
size a high degree of program modularity. 

The programming system 

This section briefly discusses the language levels of 
a proposed hybrid programming system and the pro- 
cessing required between the levels. Figure 1 indicates 
the overall structure of the programming system. The 
programmer can enter the system by writing at the 
language level best suited to his needs. The language 
levels follow closely those defined for digital computer 
programming systems. The processing required between 
levels is outlined in Table IV. 

1. Hybrid Source Language (HSL ) : 

In the Hybrid Source Language, the simulation problem 
is represented in a “natural” form, hopefully inde- 
pendent of its specific implementation. Such a language 
must have the ability to describe continuous systems 
and also have the general capabilities associated with 
higher level languages such as FORTRAN. CSSL 
(Continuous System Simulation Language 14 ), a language 
developed by Simulation Councils Inc., has both of 
these capabilities. In addition, its macro and program- 
mable structure features give it the great flexibility 
needed in a general simulation programming language. 
With some relatively small modifications, particularly 
in the control statement area, CSSL is taken here as 
the Hybrid Source Language. 

In addition to syntax checking, the main processing 
at this level relates to the computer allocation problem 
discussed earlier. 



Figure 1 — -A hybrid computer programming system 
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TABLE IV — Processing between language levels 
Language 

Level to Level PROCESSING OPERATIONS 


l(HSL) 1. Syntax Checking 

to 2. Allocation to analog and digital subsections of hybrid 

2(ASL) 3. Documentation 


1. Syntax Checking 

2. Designation of variables as being generated on 
analog or digital subsections of hybrid 

3. Allocation of interface channels 

4. Establishment of necessary interface routines 

5. Compilation of digital part of program 

6. Compilation of analog part of program 

a. production of analog assembly statements 

7. Simulation of analog part, if requested 

8. Execution of digital parts to obtain timing informa- 
tion, if requested 

9. Documentation 


3 (HAL) 1. Syntax Checking 

to 2. Macro Processing and Listbuilding 

4(HML) 3. Scaling Problem 

4. Allocation of Analog Components 

5. Production of Static Check Values 

6. Documentation 


2(ASL) 

to 

3 (HAL) 


2. Allocated Source Language ( ASL ) : 

The Allocated Source Language is effectively the same 
as HSL except that groups of instructions have been 
tagged to identify their implementation target. A 
modified form of CSSL may be used at this level. The 
main modification is in the addition of control state- 
ments ANALOG and DIGITAL which indicate the 
implementation of the block to follow. These state- 
ments may be combined with the CSSL block structure 
statements such as DYNAMIC and PROCEDURAL 
to form statements such as ADYNAMIC and DPROD- 
CEDURAL which indicate that the statements in the 
dynamic and procedural blocks which follow are to be 
implemented on the analog and digital parts of the 
hybrid respectively. 

Additional control statements are necessary to insure 
effective programming of the hybrid. Among these are 
statements specifying the analog output devices re- 
quired, requests for a digital simulation of an analog 
implemented part of the program and statements indi- 
cating that lower level language instructions are to 


follow. A special control statement, TIME, is also 
necessary at this level to request timing information 
on digitally implemented blocks. 

The main processing at this level involves the 
recognition of variables as being generated on the 
analog or digital part of the hybrid, the compilation 
of the separate analog and digital routines to produce 
assembly language versions of the separate parts and 
the establishment of interface routines to access and 
transmit those variables crossing the interface. It 
should be noted that compilation of the digital part 
of the program follows usual digital computer com- 
pilation procedures, the primary difference occurring 
when an analog generated variable value is needed. 
At this point either an interface utility routine is 
called to obtain the variable, or a table containing 
periodically updated variable values is acessed. 

3. Hybrid Assembly Language {HAL ) : 

The analog oriented part of the Hybrid Assembly 
Language represents the analog part of the problem 
in terms of the, physical components on the analog 
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patchboard without being concerned with specific 
component allocation or component patching; e.g., the 
connection between an amplifier and a potentiometer 
is specified without indicating the specific amplifier 
and potentiometer on the patchboard. The digital 
part of HAL is just the assembly language associated 
with the digital subsections; it is not discussed further 
here. 

Unlike the previous language levels, this language 
refers entirely to specific hardware components and as 
such is machine dependent. Currently such a language 
has been designed for use on the EAI 680 analog/logic 
computer. A translator for this language has been 
implemented in Fortran IV. The details of both 
language and translation are presented in the following 
section. The processing at this level is outlined in 
Table IV. 

4. Hybrid Machine Language ( HML ): The analog 
oriented part of the Hybrid Machine Language is the 
same as the analog part of HAL except that each 
instruction now contains a specific number attached 
to it indicating the particular component on the analog 
patchboard on ^which this instruction will be imple- 
mented. 

A user entering the programming system at this 
level can request all the processing associated with the 
previous level with the exception of component allo- 
cation which has already been performed. 

The hybrid assembly language 

This section discusses the details of the Hybrid 
Assembly Language. The section is divided into two 
parts. The first section describes Basic HAL, a symbolic 
analog interconnection language, and the second 
section describes Marco HAL, a macro language 
extension to Basic HAL. 

1. Basic HAL : Basic HAL consists of a set of in- 
struction types which enables the representation of 
analog components and their interconnections. The 
components are identified by mnemonic analog opera- 
tion codes referred to as micro operators. Thus, for 
example, POTU designates an ungrounded potentio- 
meter, while AMPL designates the amplifiers associated 
with the limit summer. A distinct micro operator is 
present for each analog and logic component available 
on the analog patchboard. 

Connections between components are specified 
through the use of common connection symbols simi- 
lar to a FORTRAN variable name. By using common 
symbols to represent the inputs and outputs of different 
components, connections between components are 
defined. 


The general format for a Basic HAL instruction is: 

((output list)) = (analog micro operator )( (input list)) 

The output list is a list of connection symbols (termed 
output symbols) which associate names to the various 
outputs of the component designated by the analog 
micro operation. A list is necessary since certain 
components such as multipliers and relays have several 
outputs. The input list may contain connection 
symbols, parameter identifiers or numeric parameter 
values. The following example illustrates the use of 
four Basic HAL instructions in representing an inte- 
grator. The two inputs to the integrator are represented 
by XI and X2, and the output by Y4. 

(1) Yl = REFP 

(2) Y2 — POT(Yl,.o) 

(3) (,Y3) = RCNC(X1,,,X2,„) 

(4) Y4 = AMPC(,Y3,Y2,Y4„„„) 

The first instruction specifies a positive reference 
voltage which has the output symbol Yl. The second 
instruction is a grounded potentiometer set to .5, and 
fed by Yl. The output Y2 provides the initial condition 
for the combination amplifier specified in instruction 
(4). Instruction (3) specifies a resistance network 
associated with a combination amplifier. At the Basic 
HAL level this network must be specified since it is a 
component which can be used separately. The inputs 
to be integrated, XI and X2, are fed into this network. 
The output of the network is fed into a combination 
amplifier using the common connection symbol Y3. 
Proper capacitive feedback is specified for the combi- 
nation amplifier in instruction (4) by having the output 
symbol Y4 in the proper position of the input list. A 
dollar sign or blank is used to indicate that no input 
is present into these component positions. 

Because of the large number of available com- 
ponents and their highly flexible interconnection, 
instructions at the Basic HAL level are often quite 
complicated. This complexity is, however, necessary 
at this level, if complete patchboard representation 
ability is to be achieved. The combination amplifier 
instruction is a good example of this complexity. 
Table V defines all the inputs to the combination 
amplifier in the notation employed in Reference 6. 

In addition to the Basic HAL instructions several 
control statements are available. Among them are the 
PVALUE control statement which defines parameter 
identifiers and assigns numeric values to them, the 
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TABLE V — Combination amplifier instruction 

Y - AMPC (XI, X2, X3, X4, X5, X6, X7, X8, X9, XH 

XI : AJ (Amplifier Junction) Input 

X2: OJ (Operate Junction) Input 

X3: IC (Initial Condition) Input 

X4: F (Integration Feedback) Input 

X5 ; X6 : Mode Control Inputs 

X7 ; X8: Feedback Capacitor Selection Inputs 

X9: IJ (Initial Condition Summing Junction) Input 

X10: C (Feedback Capacitor Disconnect) Input 


HALEND control statement which terminates the 
HAL program and the ALOCATE control statement 
which is described below. 

The programmer can specify the use of particular 
components on the patchboard by appending the 
analog micro operator with a component number. 
Thus writing 

Y2 = POT 02 (Yl,.5) 

will direct the compiler to allocate potentiometer 02 
when implementing this instruction. Instructions in 
which particular components are specified are effec- 
tively analog machine language instructions. Due to 
this a separate discussion of the analog machine 
language is not necessary. The capability is also pro- 
vided for restricting component allocation to certain 
parts of the analog patchboard. This is done through 
use of the control statement ALOCATE. ALOCATE 
is followed by number pairs which designate sequences 
of trays from which components are to be selected for 
problem implementation. On the EAI 680 trays are 
basic modules into which the board is divided; one 
tray may contain one or more components. Thus the 
statement ALOCATE (000,029) would restrict problem 
implementation to the first thirty trays of the EAI 680 
patchboard. Naturally if it is not possible to find the 
appropriate components or number of components in 
the trays specified a diagnostic message is printed out. 

2. Macro HAL : The Macro-HAL language consists 


of Basic-HAL with additional procedures for generating 
macros. In addition, a standard set of system imple- 
mented macro instructions is provided for certain 
commonly used instruction groups. A macro instruction 
is generated by supplying the assembler with a macro 
definition. The Macro HAL instruction format is the 
same as the Basic HAL format except that in addition 
to the analog micro operator there is an analog macro 
operator. 

(< output list > ) = < analog macro operator > 

( < input list > ) 

This analog macro operator is defined in the macro 
definition and is any identifier not already used as a 
micro or macro operator. To call a macro, one simply 
writes the Macro HAL instruction with the appropriate 
macro operator, connection sjmnbols and parameters in 
their places. 

Table VI indicates the format required in a macro 
definition. A header, AMACRO, and a trailer, 
AMEND, define the beginning and end of the macro 
definition. The prototype statement is the Macro HAL 
instruction in the format given above. This defines 
the macro operator and the number and position of 
the inputs and output symbols to be expected. The 
next statements required are declaration statements. 
These macro control statements indicate which 
identifiers are to be considered as connection symbols 
and which as parameter identifiers. The body of the 


TABLE VI— Macro definition format 


AMACRO 

“Prototype Statement” 
“Declaration Statements” 
“Body” 


AMEND 


1. Macro HAL Instructions 

2. Macro Assembly Instructions 
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program consists of Macro HAL instructions described 
earlier, and Macro Assembly instructions. The Macro 
Assembly instructions provide for symbol operations 
such as substitution, arithmetic operations on param- 
eter values and identifiers, and conditional operations 
for expansion time component and parameter changes. 
With these facilities, very flexible macros can be 
written which conditionally adapt the implementation 
structure to the requirements of the problem. 

Most of the major operators such as integrate, 
sum, etc., associated with digital simulation languages 
and requiring several components for analog imple- 
mentation are provided as system macros. These 
assembly language system macros also represent a 
target language into which the differential equation 
based notation of the Allocated Source Language is 
to be translated. 18 

Processing the hybrid assembly language 

The processing required for HAL is indicated in 
Table IV. The first tasks of syntax Checking, macro 
expansion and list building result in the production 
of a linked list and several associated tables. Together 
they represent an easily accessed and processed internal 
digital representation of the analog problem. 

Scaling the problem and producing static check 
values, though not yet implemented, also occur at this 
level. These tasks may be performed at higher language 
levels, however, this level has been chosen to facilitate 
rapid on-line programmer interaction. Thus once 
changes have been made in the structure of the analog 
problem on the patchboard, the equivalent changes 
can be made in the internal digital computer represen- 
tation of the problem and new scaling and static check 
information can be requested. 

Allocation of components to the analog patchboard 
is automatically done at this level. When performed 
manually, the allocation involves matching components 
to blocks in a block diagram problem solution. The 
criteria for such assignments are often qualitative and 
include such notions as compactness of patchboard 
wiring and neatness or symmetry in wiring appearance, 
both of which aid in problem debugging. The assign- 
ment itself, though sometimes tedious, is easily effected 
using these visual qualitative criteria. 

When mechanizing the component alio ation task 
on the digital computer, two main approaches are 
available. The first attempts to give meaning to 
qualitative criteria such as compactness and neatness 
through the development and subsequent optimization 
of appropriate objective functions. Objective functions 
such as wire length, wire crossovers, and area covered 


on the board are often used in digital computer back- 
board wiring and, to some degree, do reflect the concept 
of compactness. The general problem of allocating 
objects (components) to locations on a board, subject 
to restrictions on object placement, with the goal of 
minimizing some objective function is often referred to 
in the literature as the “assignment” or “placement” 
problem. 16 Algorithms for the solution of such optimi- 
zation problems can take several hours of computing 
time 16 when several hundred objects and locations are 
present. This is due largely to the astronomical number 
of ways one can allocate a given problem and the slow 
and not easily predicted convergence properties of 
available algorithms. The large computing time re- 
quirements make this approach unsuitable for a short 
compile time or on line programming system. In ad- 
dition, it is not clear that these objective functions 
meaningfully quantify the qualitative allocation cri- 
teria generally employed by programmers. 

The second approach to the component allocation 
task is to develop a set of heuristic algorithms which 
try to embody concepts such as compactness and neat- 
ness while at the same time keeping computing costs 
at a minimum. As with most heuristic algorithms, the 
one currently implemented in this system has worked 
well on most, but not all of the problems it has en- 
countered. In every problem it does, however, find 
a legal allocation if it is possible. The basic assumptions 
of the heuristic are given below. 

a. Components used as integrators and summers 
are generally the key elements in determining the way 
a programmer patches a board, or draws a flow diagram. 
These components should therefore be allocated to 
preserve, as much as possible, the visual signal flow 
patterns between them. 

b. Patching situations such as initial condition 
pots and pots tied to the inputs or output of amplifiers 
should be considered as special cases. In many of these 
and other cases, the patchboard of the EAI 680 has 
been designed for neatness and compactness by pro- 
viding special patchplugs which may be used instead 
of wires. Since plug patching is both neat and repre- 
sents a minimum wire length it should be utilized 
where possible. 

c. A certain amount of patching compactness is 
desirable. The remainder of the components should 
therefore be allocated in the basis of their closeness 
to already allocated components. 

These assumptions form the basis of a three phase 
allocation algorithm with each phase corresponding 
to one of the assumptions above. The details are not 
discussed in this paper. The example provided in the 
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next section, however, demonstrates the results of the TABLE VII— Basic HAL input for automobile 

algorithm. suspension problem 


An example 

Figure 2 contains the scaled block diagram of an 
automobile suspension simulation. 16 Table VII is an 
input listing of the problem as represented in the Basic 
HAL language. The same problem represented with 
the use of the system macros integrate (INTG), sum- 
mation (SUM), and invert (INVT) is given in Table 
VIII. There is approximately a three to one reduction 
in code lines required when the problem is represented 
using the macro facility and this representation is 
reasonably clear and compact. The resulting allocation 
of the problem to the EAI 680 patchboard is given 
in Figure 3 and indicates that much of the problem’s 
visual signal flow patterns have b$en preserved. 

The programming system which to date includes 
the syntax checking, and the macro, listbuilding and 
allocation processing described previously has been 
implemented in FORTRAN IV. For the example 
above, this processing took approximately four seconds 
when executed on a Univac 1108 computer. A more 
complex problem, the Cable Arrestor Problem, 17 con- 
taining roughly twice as many components took nine 
seconds. 

CONCLUSIONS 

This paper proposes a hybrid programming system 
in terms of four language levels and the processing 
required between them. Some of the details of the 
lowest language levels which have been implemented 
are presented and an example demonstrating the use 
of the system is given. 

Currently the authors are engaged in completely 
specifying the modifications necessary for transforming 
CSSL into a desirable allocated source language. A 
continuing study is also being made of the interface 



Figure 2— Simulation of an automobile suspension 
system 


PVALUE (K1 = .16, K2 = .5, K3 = .8) 
P9 = POT (S3, Kl) 

P10 = POT (XI, .5) 

Pll = POT(C2, K2) 

P12 = POT(J4, .4) 

P13 = POT(Xl, .1) 

P14 = POT(X2, .5) 

P15 = POT(J4, .5) 

PI 6 = POT(NREF, .5) 

P17 = POT(NX2, .15) 

P18 = POT(X2, .5) 

P19 = POT(S8, K3) 

P20 = POT (S3, .2) 

(,R1) = RCNC(, , ,P9, P12, ,) 

(,R2) = RCNC(P10, , , , ,) 

(,R6) = RCNC(,,,P15, P20, P19,) 

(,R7) = RCNC(,,,P17,,,) 

R3 = RCNS(P11 ,P14, , , , ,S3) 

R8 = RCNS(P16,P18,,,,,S8) 

R5 = RCNS(P13,NX2, , , , ,S5) 

J4 = AMPJ(S5, ,) 

XI = AMPC(,R1, ,Xl, , , , , ,) 

C2 = AMPC(,R2, ,C2, ,,,,,) 

NX2 = AMPC(,R6, ,NX2, , , , , ,) 

X2 = AMPC(,R7, ,X2, ,,,,,) 

S3 = AMPS(R3, , , , ,) 

S8 = AMPS(R8, , , , ,) 

S5 = AMPS(R5, , , , ,) 

NREF = REFN 
HALEND 


requirements between the analog and digital program 
subsections. In addition implementation continues on 
the lower level processing tasks, the initial goal being 
a subsystem which handles the analog subsection of 

TABLE VIII— Macro HAL input for automobile 
suspension problem 

PVALUE (Kl = .16, K2 = .5, K3 = .8) 

XI = INTG(K1,S3,.4,J4) 

C2 = INTG(. ,5X1) 

NX2 = INTG(.5, J4, .2, S3, K3, S8) 

X2 = INTG(.5,NX2) 

S5 = SUM(.1,X1, 1, NX2) 

S3 = SUM(K2, C2, .5, X2) 

S8 = SUM (.5, X2, 5, NREF) 

J4 = GAIN (S5 , 1) 

NREF = REFN 
HALEND 
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Figure 3 — Automobile suspension problem allocated 
to EAI 680 patchboard 


a program at the Macro HAL level, and contains a 
limited interactive mode (Figure 1) capable of online 
scaling and static checking in response to patchboard 
configuration changes. 

Programming costs for hybrid computers have mush- 
roomed to the point where the economic justification 
of hybrid simulation projects is being questioned. It 
is hoped that this proposal will both stimulate dis- 
cussion in this area and fill a current and growing need 
for an effective hybrid programming system. 
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Hybrid executive — User’s approach 


by W. L. GRAVES and R. A. MacDONALD 

TRW Systems Group 
Redondo Beach, California 


INTRODUCTION 

Hybrid executive programs have long been prevalent 
in the hybrid computer simulation industry, however, 
what should be the essential features of a hybrid execu- 
tive is still a controversial subject. For the most part, 
the design of hybrid executives has been undertaken 
by the manufacturers of hybrid systems and in many 
designs the complexity in the operation of these 
programs has resulted in their usage only on large 
class digital systems. Consequently, hybrid facilities 
which employ a small to medium class digital computer 
system are faced with the task of developing an exe- 
cutive program compatible with the facility environ- 
ment. However, in many of these small to medium 
hybrid facilities, the segregated program development 
effort for a hybrid executive is not undertaken until 
considerable time after the installation of the hybrid 
system. The normal reasons are inadequate program- 
ming funds or a higher priority assignment of available 
personnel to satisfy programming and development 
needs of existing hybrid simulations. 

For hybrid computation, specifications for the ex- 
ecutive design must include sufficient flexibility to enable 
the user to easily alter the mode of the executive 
execution at run time as well as at compilation time 
to meet the requirements of the particular engineering 
problem being simulated. In hybrid executives existing 
today, such flexibility does not generally exist. These 
executives usually consist of a conglomeration of many 
programs that perform specific functions and are 
linked together only to the extent that the order of 
their execution is controlled by a simple monitor. 
However, the nature of these functions is such that 
the provision of linkage between control and problem 


data could considerably reduce the complexity of their 
implementation while increasing flexibility. 

In this paper, the philosophy for a hybrid executive 
design, which has evolved from extensive user experi- 
ence, is described. Since it is a user philosophy, it is 
relatively unique in the hybrid simulation industry 
wherein most designs are specified by “software ex- 
perts”, which usually have attained their expertise 
via an all digital environment. A definition of the term 
“user” is in order. A user is defined as a person in the 
role of either an applications programmer or engineer- 
ing analyst as opposed to a system software program- 
mer or analyst. The hybrid executive (hereafter re- 
ferred to as the TRW executive) discussed in this 
paper was primarily developed to satisfy the simu- 
lation requirements for a large aerospace engineering 
problem. However, the authors feel that the extended 
usage of this executive to other applications, whatever 
the size, is reasonable. The general requirements for 
this problem and the rationale used in the design of 
the executive programs are discussed. 

Typical executive requirements for hybrid simulation 

In early 1967, the TRW Analog/Hybrid Facility 
had been requested to develop a large multi-use 
hybrid simulation capability in support of -the Apollo 
program. For this study, which involved several inde- 
pendent simulations, each basically simulating two 
vehicles in 6 DOF and employing as many as two 
control systems for each vehicle, it became very ap- 
parent that total executive control for each of these 
simulations would be required for the following reasons : 

• The size and complexity of the simulations would 
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require an extensive daily checkout to assure 
simulation readiness. To accomplish this task by 
manual means on the analog would be impractical, 
and therefore, potentiometer setup and static 
checkout using digital control would be required. 
Also, since it was expected that the definitions of 
the simulation state would change frequently, 
either due to changes in parameters or to different 
selections of program options, the pot setup and 
static checkout programs should have sufficient 
flexibility to assure analog or system readiness 
for the current simulation definition. 

• Complete flexibility in the data input and output 
formats, such that either the simulation staff or 
the various engineering analysts assigned to this 
project could communicate with the simulations 
in a familiar, user oriented, language and without 
burdened details of specific data formats. 

• A large simulation staff of programmers of varying 
experience and backgrounds would be assigned to 
the program, therefore, generalized software to 
handle control such as interrupts, analog/digital 
interface, sampling, etc., need be developed such 
that program interfacing would not be a difficult 
task. 

• Because of the size and complexity of the simu- 
lation and because of an additional requirement 
to be able to use the simulations for a multiple of 
studies, scaling of both amplitude and time would 
be difficult to specify prior to execution. Therefore, 
the capability to rescale at run time would be 
necessary to reduce considerably the recompila- 
tions required if this information is fixed within 
the program. 

• A requirement to display the dynamic status of 
up to 'several hundred variables either digitally 
and/or via the analog would be necessary. Because 
digital display using a line printer during problem 
execution would be time prohibitive, a dynamic 
dump capability to external bulk storage (disc 
drives or magnetic tapes) for later recovery or 
f urther processing would be required. 

• Because of the potential multiple of uses for the 
simulation programs, data I/O requirements from 
study to study would be expected to vary con- 
siderably. Since it would be highly inefficient to 
recompile the programs for each new I/O con- 
figuration, the executive capability must include 
a means for defining the I/O processes at exe- 
cution time rather than at compilation time. 


• Since the total digital program storage require- 
ments were expected to exceed available memory, 
the executive program structure must provide 
capability for program overlay and data inter- 
facing in a manner not overburdening to either 

. the user or the respective programmers. 

In satisfying the requirements for executive control 
of the Apollo simulations, two important constraints 
were applied. First, development and design effort of 
the executive must be done within the budget and 
schedule allotted by the Apollo simulation task, and 
second, sufficient generalization and compatibility must 
be maintained in the design for adaptation to other digi- 
tal software systems, if necessary, during the simulation 
effort. This latter constraint implies that the design 
and implementation should not require modification 
of software provided by the computer manufacturer, 
(loader, compiler, I/O, etc.) for operation. 

Evolution of the executive design and development 

In the Hybrid Computation Facility at TRW 
Systems Group, which currently employs a medium 
class digital computer (CDC-3100) linked to four 
analog computers (two Beckman 2132’s and two Com- 
cor CI-5000’s), a generalized hybrid executive program 
was not available for nearly three years from the time 
of installation in 1964. A reasonable software develop- 
ment activity within TRW could not be initiated with 
the available personnel because of committments to 
simulation development for several large programs. 
Prior to late 1967, executive control for hybrid simu- 
lations was tailored specifically to fulfill the require- 
ments for the particular study and was generally not 
applicable from study to study. However, valuable 
experience had been gained in realizing, from a usage 
point of view, the total requirements and capabilities 
for a generalized hybrid executive program. 

Upon the initiation of the Apollo simulations in 
1966, two approaches for developing a hybrid executive 
were considered. One approach was to develop a com- 
plete executive separate from the problem imple- 
mentation and later integrate the two programs for 
final checkout. A second approach was to develop the 
executive in parallel with the problem implementation 
and integrate and check out the combined modules of 
the simulation as they were developed. From the 
stringent Apollo simulation schedule, it was apparent 
that the latter approach would be more feasible. Conse- 
quently, the design evolution of the executive was 
dictated by satisfying the particular simulation re- 
quirements at the time of implementation. As a result, 
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many of the capabilities presently existing in the TRW 
executive have resulted from second or third generation 
design changes as user flexibility and program efficiency 
so required. 

Program description 

Several basic philosophies were adhered to during 
the executive design and development: 

1. Any information required in defining the simu- 
lation which may change frequently is entered 
as data at run time. This class of information 
includes items such as scale factors, linkage 
assignments, analog component or console 
assignments, required program sequencing con- 
trol flags and all problem parameters. 

2. Any information that is changed only if the 
engineering system being studied is redefined is 
compiled into the system. . This would include 
items such as problem equations, etc. 

3. All control or problem executions which are 
non-time critical, that is, not required for the 
dynamic execution of the problem, need not 
reside in memory during the time critical exe- 
cution, Functions such as pre-data and post- 
data processing, initialization, pot value de- 
termination and setting, static check determi- 
nation and interrogation are non -time critical 
and are usually executed once per run sequence 
and therefore may be program overlayed, thus 
optimizing or reserving resident core for the 
time critical or “Real Time" program. 

4. All data values required to transfer information 
or problem status between major program 
functions must reside in core using a “COMMON" 
reserved data area. It is this important con- 
straint on implementation that permits the 
usage of program overlaying and aides significant- 
ly in the executive design. 

Five or six separate computer functions or programs 
can be defined, which satisfy the total simulation 
requirements: data I/O processing, initialization, pot 
evaluation and setting, real time execution, static 
check evaluation and interrogation, and possibly, post 
data processing. Figure 1 depicts the general organi- 
zation of these functions. It should be noted, that the 
order of execution of these functions is completely 
determined by the user at run time from data input, 
and that any single function can be executed sepa- 
rately or by an automatic sequencer. 

Since overlaying processing is used, each function 



Figure 1 — Hybrid executive program structure 


or program comprises, but not necessarily so, a separate 
computer overlay with each in turn further overlayed 
(with the exception of the real time program) as in- 
creased core requirements are experienced. Each of 
these programs is executed by a simple driver or 
monitor upon command by the user utilizing the 
resident COMMON for data transfer. In the following 
sections, the design for each of the five major programs 
and the control of their execution is briefly discussed. 

Data I/O processing 

Because the most frequent interaction between the 
user and the system occurs through the I/O portion of 
the executive, special attention is warranted to make 
the interaction as painless as possible. Since the external 
characteristics of entering both data and action requests 
are identical for the TRW executive, the following 
comments generally apply to both classes of infor- 
mation. 

The essential task performed by I/O software is 
the conversion between data representations required 
externally to the computer. Each time an item of 
information is processed for I/O, a description of the 
item sufficient to allow conversion must be available. 
The TRW executive requires inclusion of descriptors 
that specify the following. Names entered must be 
defined as data identifiers or action requests identifiers. 
The internal classification of the data, REAL, INTE- 
GER, OCTAL, etc., must be specified. Differentiation 
must be made between data that is part of an array 
and data that is not. Conversion from one set of engi- 
neering units to another is also allowed and must be 
specified. 

Clearly, any I/O format that requires specification 
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of all of these descriptors every time an item is refer- 
enced is untenable. In the TRW executive, the approach 
used to reduce the problem requires the user to provide 
a list of all names that are to be accepted and the 
required descriptors of each. Specification of the 
required descriptors is done using FORTRAN oriented 
names such as REAL, INTG, etc. This list is compiled 
to allow ease of linkage with appropriate I/O handling 
routines. Once the list is defined, entry of data requires 
only a name and a numeric value. Since all conversion 
is pre-specified, no artificial indicators, such as a 
decimal point to specify a floating point number, are 
required. Since it is reasonable to expect the descriptors 
defined for each data value will not change unless the 
problem definition changes, no appreciable loss in 
flexibility for I/O processing is realized when the de- 
scriptor list is compiled. 

The internal definition of conversion requirements 
also permits extremely simple definitions of display 
requirements. In this case, the data value already 
exists within the computer and only the name of a 
variable is necessary to complete the information 
needed within the computer to define output require- 
ments. Indirectly, this has allpwed requesting all 
display functions by simply entering a list of names. 
The implications contained here are best illustrated 
in the case of specifying “Dynamic Dump” require- 
ments. This is an output function that should be time 
optimized. Unfortunately, optimization of a routine 
to output floating point data requires different in- 
structions than those needed for output of fixed point 
variables. In view of this, a problem arises when it is 
desired to intermix floating and fixed point numbers 
in a single general request list. The TRW executive, 
since it has access to all pertinent descriptive infor- 
mation, can handle this problem internally without 
the user even being aware that it is happening. The 
allowance of such mixed mode lists is provided for 
printing and dynamic dumps. 

Another problem often encountered in trying to 
enter data into a computer is caused by the presence 
of rigid format structures such as: requiring that items 
be aligned to specific card columns. Where users are 
often required to hurriedly keypunch or type in their 
own data for performance of runs, such rigidity becomes 
too restrictive. Thus, one design criterion for the I/O 
package was the elimination of this problem. A solution 
was achieved through use of an input string scanning 
routine which searches an entire input record for 
appropriate data fields. 

In the case of action requests two forms exist and 
are distinquished only by their manner of use. The 


first form, which is the larger class, is referred to as 
an I/O action request and the functions performed 
are restricted to various manipulations of data. Re- 
quests for saving program status on a disk or trans- 
ferring data from cards to tape are examples. Basic 
to this class is the requirement that the subroutine 
used to process the request returns control, to the 
executive input output controller. The second class of 
action request, referred to as program execution 
requests, is used to initiate execution of hybrid functions 
not related to I/O. In this case the routine used to 
satisfy the request passes control to the executive 
execution sequence controller rather than the I/O 
controller. In both cases, the specification of the 
request to the executive program is the same and the 
user implies through his own subroutine the class to 
which the request belongs. Figure 2 shows the control 
used for I/O processing in conjunction with how this 
control interfaces with the executive control of those 
functional blocks as indicated in Figure 1 . 

Potentiometer evaluation and setting 

As part of performing each and every computer run, 
potentiometers must be set to the proper values. In 
most small and medium sized hybrid labs the ability 
to do this from the digital is provided with one of 
two levels of sophistication. The first requires specifying 
the address of the potentiometer to be set and the 
value to which it must be set. The second requires 
specification of the potentiometer address, parameter 
values and a FORTRAN like expression used in com- 
puting the setting, The latter then both computes the 
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setting and automatically sets the potentiometer using 
an interpretive compiler. Both of the methods require 
that the user select those potentiometers whose settings 
will change. This selection is based on the engineer’s 
knowledge of parameter value changes, and in lumped 
parameter definitions or where the same parameter 
is used repeatedly throughout the problem, this can 
be very cumbersome. 

The TRW executive automatically includes the 
necessary setting changes in the digital program and 
thus relieves the user of an unnecessary burden. Since 
the actual setting is the only number associated with a 
potentiometer that reflects parameter variations, it is 
used to initiate resetting of potentiometers. The method 
used is as follows: a list containing all setting values 
is retained on bulk storage; as part of each run, all 
potentiometer settings are computed and compared to 
the list; a difference between the two values automati- 
cally results in a resetting of the potentiometer and 
the list being changed to reflect the new value. 

Although the concept used is very simple, there are 
implications that markedly affect program imple- 
mentation. The most pertinent of these is the require- 
ment that all current parameter values be available 
to the program which computes the potentiometer 
settings. To easily make these values available and to 
still retain the speed necessary to make computation 
of all settings feasible, requires compilation of the 
setting evaluation routine instead of using an interpre- 
tive routine as do many of the hybrid computer manu- 
facturers. Clearly, interpretive methods offer con- 
siderable flexibility in specifying potentiometer values, 
but the authors believe that this degree of flexibility 
is not necessary 

Before clarifying this point of view, a definition is 
in order. 

Let Ps 5=5 A-sf * Dp 
where P a = Potentiometer setting 
D P = Pot definition 
A bf = Analog scale factor 

Assuming the reader is familiar with the meaning of 
u potentiometer setting” and “analog scale factor” the 
given equation will suffice to define “Pot Definition”. 
The important characteristics of a pot definition are 
its dependency only on physical parameter values and 
its corresponding independence of potentiometer ad- 
dress or analog scale factor. 


Dependency only on physical parameters implies 
that a “pot definition” changes only when the problem 
being solved is redefined in a manner such that 
equations are changed, which in most simulations is 
relatively infrequent. Thus, if the “potentiometer 
setting” program requires recompilation only when 
“pot definitions” are changed, no significant loss of 
flexibility is encountered. 

As a result of these considerations, the routine was 
formulated such that analog scale factors and poten- 
tiometer addresses were entered as data and “pot defi- 
nitions” were coded into a FORTRAN subroutine 
Use of this method utilizes the full capability of 
COMMON while retaining the flexibility at run time 
in specifying values (scale factor, component address) 
most likely to vary. 

Since the program (Figure 3) necessary to compute 
the actual setting (i.e., form the product of the “pot 
definition” and the scale factor), compare old and new 
values and handle bulk storage files is the same for 
any problem, it is formulated as part of the executive. 
Definition of the pot setting requirements for a given 
problem consists of coding the FORTRAN list of 
“pot definitions” and preparing the list of pot addresses 
and scale factors. The analog data associated with a 
pot is stored in a serial file on a disk. This data consists 
of the pot address, analog console number, analog 
scale factor, present value of pot setting, and an index 



Figure 3 — Potentiometer setting control 
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(I) which defines where in an array (D P ) the value of 
the pot definition is stored by the FORTRAN routine 
used to evaluate the pot definitions. 

Initialization or finalization 

In engineering simulations, the analyst prefers to 
have the mechanization in a form which is either 
familiar to him or closely related to the physical system 
being simulated. In digital simulation, the analyst is 
usually far removed from the program, and if his 
results are of a suitable form, the actual formulation 
of the equations is of little interest and can therefore 
be optimized for computer efficiency and stability. 

To the contrary, in analog or hybrid simulation 
where a close rapport with the program is desirable, 
mechanization in either an optimum or in a less com- 
puter sensitive manner is often traded off against a 
more realizable formulation. As an example, an analyst 
might have access only to data determined in a refer- 
ence frame that differs from the reference frame best 
suited for use within the computer j(e.g., gimbal angles 
vs direction cosines). In such cases, reformulation of 
values for computer initialization may require extensive 
computation. In hybrid computer simulations the 
digital computer can be used to determine such values 
regardless of the complexity. With this capability, the 
total simulation can be formulated for optimum exe- 
cution, and often better computer stability, and the 
results transformed to the users preference without 
decreasing the flexibility to the user or analyst. 

To accomplish the reformulation transfer from the 
user desired input form to the program execution form 
to the user desired output form, non-time critical 
digital calculations, which may be considerable, need 
be performed. Examples would be coordinate trans- 
formations, root extraction, curve fitting, data analysis, 
etc. Since these types of calculations are executed only 
once each run cycle, they can be programmed using 
FORTRAN, extended precision, and non-optimal 
programming techniques with a negligible increase in 
the system execution or throughput. It is this purpose 
that the preinitialization and/or finalization programs 
serve. Since these programs are entirely dependent on 
the problem being simulated, the only executive 
function is the call to these programs and the pro- 
vision for data linkage through the use of COMMON. 

Real time program 

The specification of software that would appreciably 
aid in getting the real time program operational was 
based on a generalization of the kind of problem that 


would be solved using the system. It was assumed 
that the physical system being studied could consist 
of several interacting subsystems each having a unique 
frequency content. For example, in the Apollo studies 
the kinematics and dynamics are frequency separable. 
This kind of system implied a computer program 
consisting of several loosely interacting subprograms 
each having its own timing and sampling requirements. 
Two primary questions to be answered were “What 
can an executive do that will provide assistance in 
programming each subprogram?” and “What aid may 
be provided in correlating the subprograms to represent 
a complete system?” 

Two facts immediately suggested general answers to 
the questions above. Because each simulation repre- 
sents a different system, the equations solved in the 
real time program are essentially unique for each new 
problem, and can be considered only by the user. At 
the same time, certain functions such as mode control 
and inter-computer data transfer are common to all 
simulations and characteristically depend only upon 
the computer system being used. Experience has shown 
that the user normally displays considerable ability 
to solve problems associated to his equations, but that 
his performance deteriorates markedly when dealing 
with computer system dependent functions. Clearly, 
a general executive can only address itself to aid in 
handling the computer system dependent problems 
present in the real time program. It is also clear, 
however, that these are the areas where the user most 
needs aid. 

The TRW executive includes three major activities 
within the real time part. It provides generalized soft- 
ware to handle ADC/D AC specification control of the 
“dynamic dump” (time histories), and mode control 
and interrupt processing. Relating these activities to 
the questions above, it is found that generalization of 
these functions provides assistance to the user both in 
programming individual sub-programs and in overall 
system correlation. Justification of this last statement 
requires a more detailed description of each of the 
functions considered and their interaction with the 
user. 

Although extensive details pertaining to the methods 
used in implementing the TRW executive are not 
appropriate, some indication of the gross approach used 
is appropriate. The available interrupt structure allows 
execution of up to eight concurrent real time sub- 
programs (this limit of eight is caused by the maxi- 
mum number of programmable interrupts available in 
the TRW hybrid system). A subprogram naming con- 
vention has been adopted to allow flexibility in choosing 
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the interval at which variables are stored on bulk 
storage or at which variables are transferred for display 
purposes. The subprograms are arbitrarily named 
LOOP1, LOOP2, etc., up to the maximum number 
allowed by the interrupts available. The number 
associated with the loop is then used as a key to initiate 
certain action. In the case of dynamically dumping 
variables, the following scheme is used: each sub- 
program includes a call to the routine which performs 
the dump operation; the parameter passed with the 
call is the loop number; this number is compared to a 
number entered as data which specifies the subprogram, 
and thus, the time interval at which the dump is to be 
made; if the numbers compare, a dump occurs. A 
similar system is used for selecting inter-computer 
display transfers. 

Associated with each subprogram are the following 
parameters which may be entered as data: 

Present problem time 

Time interval at which the subprogram is executed 
Address of the first AD C channel used 
Number of ADC channels used 
Address of the first DAC channel used 
Number of DAC channels used 
Interrupt priority level 

This data is stored as blocks in a predefined order 
known to each executive subroutine used in the real 
time program. Such a block structure permits usage 
of the same calling sequence to execute all executive 
subroutines, thereby reducing the chance of program- 
mer error to a minimum. 

AD C/D AC specifications 

The handling of ADC/DAC specifications within a 
program would seem to present little difficulty since 
even the most sophisticated DAC or ADC routine 
should require no more than three or four parameters. 
However, in many systems, specification of these 
parameters requires compilation. Such a requirement 
not only removes flexibility by requiring recompilation 
to incorporate changes, but also forces the assignment 
of specific equipment in a relatively early stage of 
program development. At the time a particular sub- 
program is written, it is usually not convenient to 
assign specific ADC’s or DAC’s since requirements 
for all subprograms must be considered in determining 
the best distribution. Similarly, conversion scale factors 
may change at any time. Another capability con- 
venient for the user is flexibility in specifying inter- 
computer data transfer for purposes of display. This 


requires specification of specific DAC’s or ADC’s, the 
variables to be transferred, the scale factors to be used, 
and the time interval at which the transfer occurs. In 
view of these considerations, it seems reasonable to 
require software that allows assignment of all param- 
eters associated with ADC’s and DAC’s at run time. 

The actual assignments are made by entering two 
lists of data; the first containing the names of the 
variables to be transferred, and the second containing 
the conversion scale factor. The lists are entered in an 
order corresponding to the ADC or DAC line that is 
being used. In the I/O processor, the list of names is 
replaced by a list of the addresses of those names and 
this along with the scale factor list, is passed to the real 
time program for tailoring of the specific transfer 
routines for a run. Because intermixing of floating 
and fixed point computations within the same program 
is rarely encountered, DAC and ADC lists have been 
restricted to include either floating point variables or 
fixed point variables, but not both. This enables per- 
formance of ADC/DAC functions in simple indexed 
loops which are easily tailored. 

Dynamic dump 

The capability for dynamically dumping variable 
values onto bulk storage during a run and processing 
them later when time becomes less restrictive, is a 
desirable feature in any hybrid system. In addition to 
providing information for analysis purposes, it is very 
useful for dynamic debugging. Two essentially distinct 
functions are associated with a dynamic dump capa- 
bility. The first involves the specification of those 
variables which are to be dumped and the actual 
performance of the dump during execution. The second 
involves the capability to display either the same 
variables that are dumped or a set of variables which 
are derived from the original variables by a user written 
processing program. Three user requirements affect the 
specification of the dump function. First, he must 
have freedom to specify those variables which he wishes 
to dump and the frequency at which they are to be 
saved. Second, for ease in interpreting the results, 
the values dumped should be coherent in time. That 
is, all values saved from a given interrupt level should 
represent functions of the same time, otherwise, a 
time skew in interpretation of the results will occur. 
Third, if post run processing of data is present, the 
user must be allowed to easily specify the form of 
process and a display list that is different from 
the list of variables dumped. 
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Mode control and interrupt processing 

In considering the most suitable form for mode 
control and interrupt handling routines, the situation 
is somewhat different from that of inter-console data 
transfer. Usually the programs necessary to handle 
these functions are very hardware dependent and 
generally so complex that only a highly experienced 
programmer can adequately cope with the problem 
involved. Here the obvious approach to specifying 
executive requirements is to remove flexibility, and 
therefore, the need for user intervention from the 
system. Some user control is necessary, however, and 
the amount of flexibility allowed by the executive 
should be sufficient to satisfy his reasonable needs. 
Certainly the user must be permitted to specify what 
subprogram he wants executed when the computer is 
in a given mode or when a particular interrupt occurs. 
He must also be able to specify the priority of each 
interrupt. It is also reasonable that an executive should 
expect the user to specify the frequency and perhaps 
the source of an interrupt. Beyond these few items, it 
should not be necessary and, in fact, it is not desirable 
for the user to intervene in the operation of mode or 
interrupt control software. The other user consideration 
that should be included is a “no penalty clause”. Thus, 
if a user requires only three interrupt levels, he should 
not be required to inform the system that the other 
available levels are not required. In general, the user 
should only be required to specify those items which 
he needs for solving his problem. 

The procedure required to specify the specific inter- 
rupt structure for a given problem is as follows. The 
address of a list is passed as a parameter to a standard 
executive routine which tailors a general interrupt 
structure to meet the users requirements. The list 
includes the names of the subprograms included in 
the real time program and the names of their associated 
data blocks. A similar list method is used to specify 
routines that are to be executed when the computer 
is placed in a given mode. 

The standard executive routine is written such that 
it completely handles all normal mode control and 
interrupt servicing. Dummy subroutine calls are 
included to allow user definition of special mode or 
interrupt routines. During initialization the executive 
extracts information from the lists described above 
and modifies the dummy calls with appropriate user 
supplied routine addresses. Similar dummy instructions 
are used to permit generalization of other functions. 
Since all dummy entries initially consist of “NOP” 
instructions, failure to specify all modes or interrupt 
levels will not affect execution. 


It was claimed earlier that the structures described 
serve to simplify the preparation of individual sub- 
programs and the correlation of these into a unified 
system. A review of the necessary steps will illustrate 
this. While writing a subprogram, the user must only 
be aware of the name assigned to the subprogram, the 
name of the data block associated with it, and the 
names of the executive sub-routines he wishes to call. 
The total number of names needed in the TRW exec- 
utive is six. 

Integration of the subprograms demands very little 
more from the user. Before final compilation of the 
real time program, lists defining the mode control 
and interrupt structure must be prepared. Since this 
is done very late in the development of the program, 
all information is readily obtainable. Preparation of 
lists describing the details of interrupt priorities, exe- 
cution intervals, etc., may be left until computer runs 
are planned. Since the entire problem should be well 
defined at this time, little difficulty is encountered in 
selecting specific values for these parameters. 

Static check 

A major task which must be performed in any 
simulation is static verification of both the hardware 
used and the program being executed. An effective 
digital program can greatly aid in carrying out many 
parts of this task. The items that can be provided by 
the digital computer system for static checking are: 

• Initialization of the system using parameter values 
chosen for the check. 

• Comparison of computer values determine from 
the physical equations in the digital with that 
those values sampled from the analog. 

• Information useful in verifying the validity of 
the equation values computed by the digital, 
is, debugging aids. 

At TRW, the first requirement is met by the normal 
executive system used for analysis runs. When a static 
check is requested, normal run setup procedure is 
followed to the end of the initialization phase of the 
real time program (Figure 2). At this point, the static 
test request is recognized and execution of the static 
check program begins. Using this method of establish- 
ing the check case provides the advantages of con- 
venience and flexibility in three ways. First, it allows 
rapid switching to the check mode using actual run 
values if a problem arises during analysis. Second, 
after the check is made and the problem corrected, 
the return to normal running conditions requires 
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absolutely no action. Third, the system allows rapid 
definition and execution of several different check 
cases. All that is necessary to perform a static check 
is the entry of desired parameter values, using exactly 
the same methods as any other analysis run, and a 
request for execution of a static check. Since defining 
a single check case that effectively verifies an entire 
analog program is virtually impossible, the ability to 
perform a series of checks is very important. Pro- 
ceeding through the initialization phase of the real 
time program has the advantage that the ADC and 
DAC values which are sampled and presented during 
initialization represent realistic problem values. This 
is sufficient to complete the set of values needed to 
base the entire static check on direct evaluation of 
the physical equations. 

The static check overlay consists of two programs. 
The first is a FORTRAN subroutine in which the user 
codes his equations for use as the check reference. 
Because the system is dependent upon having access 
to normal run parameters which are stored in the 
computer COMMON area, the use of FORTRAN was 
a natural choice. Also, the use of FORTRAN rather 
than an interpreter program does not constrain the 
user in coding the analog equations, as encountered 
in some executive approaches. 

The second program comprises the executive part 
of the overlay (Figure *4). It compares the equation 
values computed in the user FORTRAN program with 
the output of an analog component and generates 
appropriate error messages. The address of the analog 
component, the analog scale factor, and two indices 
which are used to correlate the component and the 
appropriate equation value are entered as a data 
record. Since only physical equations are coded in the 
FORTRAN program, recompilation is necessary only 
if these equations are redefined. 

The correlation of an equation to an analog com- 
ponent and scale factor is achieved by using two indices 
specified in the FORTRAN routine as follows. The 
terms or factors of an equation that appear at a par- 
ticular analog output are coded individually and stored 
in a one dimensional array. The section of coding 
for each equation is identified by a statement number 
or index. The statement number and array index are 
then included on a data card with the component 
address and scale factor to provide the necessary 
correlation. Since the computation of the terms of an 
equation is done only after a complete set of data 
cards for a given equation is read, the array used need 
only be large enough to store all of the values computed 
for the largest equation. 



Figure 4 — Static check control 


The executive also provides user options that allow 
extensive verification of the user program and the 
data files without requiring the presence of an analog 
computer. This option is usually not available in in- 
terpreter programs. The first option is a data card 
editing function that detects obvious format and key- 
punching errors. The second performs the normal 
static check procedure but replaces the interrogation 
of the analog computer with a printout of both the 
actual and scaled equation values. This data may 
then be used to do off-line debugging of the static 
check routine. Use of this feature can assure that only 
analog program debugging will be necessary when the 
analog computer is finally checked. 

Other options are available to provide flexibility. 
One allows a choice between checking all analog com- 
ponents or just components that represent the total 
value of an equation. In the latter case, an error in the 
final equation value will direct the program to check 
all of the terms of that equation. A second option 
permits skipping a check of selected parts of the 
program. 
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Operating procedures 

Operationally the TRW executive has proven very 
effective. The entire procedure for executing a com- 
puter run consists of entering desired parameter values 
and a single command “RUN”. From that point, all 
setup, operating, and display functions are performed 
automatically in a manner predefined by simple list 
inputs. The provision that analog setup routines have 
access to normal data parameters is of course the key 
to making such a simple run procedure possible. 

Future hybrid executive development 

As it was indicated earlier, the TRW executive was 
developed through a process of evolution under the 
pressure of developing concurrently a large simulation. 
Although the operational characteristics which have 
resulted from this evolution are generally very good, 
many of the systems software aspects leave room for 
development. With a recent expansion in the number 
of systems software personnel at TRW, it is now possi- 
ble to reimplement the executive on a sounder systems 
basis and integrate it into a more comprehensive soft- 
ware system. As proposed, the new system will provide 
a multi-user capability, simplified file processing, a 
more powerful I/O structure, accounting control and 
extensive debugging aids. 
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INTRODUCTION 

The application of computers to the delivery of patient 
care is more a problem of “data management” than of 
“data processing.” Although calculations and interpre- 
tation of data are often required, of much greater 
concern are the problems involved in the collection, 
communication, coordination, and presentation of 
information. As the process of delivery of medical care 
becomes increasingly complex, and involves increasing 
numbers of professional and nonprofessional personnel, 
responsibility for achieving the continuity and compre- 
hensiveness that is essential to medical care seems to 
rest heavily on the development of appropriate com- 
puter-based data management systems. Such systems 
may further provide the primary feasible means by 
which quality control, auditing of the medical care 
process, and research into the diagnosis and treatment 
of disease can be achieved. 

These functions now are dependent on the use of 
the patient medical record, although they are fulfilled 
only to a minimal extent by it. Despite changing 
functions and increased demands on it, the medical 
record has changed little in form over the past century. 
Medical records possess no organization by diagnostic 
or therapeutic problem; notes relevant to a particular 
aspect of a patient’s health may be accessed only by 
leafing through an entire volume. Terminology is not 
standard, data is not organized in well-defined formats, 
and notes are often illegible. As a consequence, the 
objective of using the computer for clinical data 
management is gaining considerable impetus. 

This paper will describe a number of criteria which 
the authors have found to be important in the design 


of systems for clinical data management, and a novel 
system which has been implemented to meet these 
requirements. The system to be described has been in 
operation for over a year. The extent to which it has 
proved useful has led the authors to believe that the 
criteria defined have general applicability for clinical 
data management. In the discussion to follows, the term 
“clinical data management system” refers to a time- 
shared computer system which supports on-line input, 
inquiry, and retrieval of clinical information from a 
central data base. 

Design and implementation 

The internal design of an information system dictates 
constraints on the external attributes of such a system. 
The characteristics that must be resolved include the 
number, priority, and level of responsiveness of the 
users, both active and inactive; the ratios among CPU 
time, connect time, and input/output time; the struc- 
ture, magnitude, and timeliness of file information; 
the profile of application programs in regard to size, 
type, and interactiveness; user requirements for de- 
velopment and service modes of operation; and finally, 
the overall economic justification for the system. 

High level programming language 

One of the most time-consuming aspects of the 
development of information system programs involves 
the optimal interfacing of the system with its users in 
a particular application area. This requires much 
attention to human engineering, and repeated modifi- 
cation and revision of programs. The implementation of 


297 



298 Fall Joint Computer Conference, 1969 


clinical data management applications has generally 
begun on relatively small computers. This has, in many 
cases, been necessary because development was a 
gradual process and started with limited objectives. 
Since high level languages have not typically been 
available on small machines, most programming has 
been done in machine language. 

The expense and inefficiency of writing, debugging, 
and modifying such programs have been serious ob- 
stacles to active research and development. A few 
clinical data management systems have used large 
general purpose computers which could provide much 
increased flexibility. However, the overhead of a large 
operating system on a major computer has often 
seemed excessive, because of the rather small amount 
of processing involved in many of these applications. 
Futhermore, because of the reliability requirements of 
a clinical data management system, modularity and 
duplication of hardware is desirable and often essential. 
Because of the expense entailed by hardware redun- 
dancy, this is typically feasible only with inexpensive, 
minimal equipment configurations!. 

The MGH Utility Multi-Programming System- 
(MUMPS) is a compact time-sharing system on a 
medium scale computer, dedicated to clinical data 
management applications. It is currently implemented 
on a PDP-9 (Digital Equipment Corporation) with 
24,000 words of 18 bit memory and a Burroughs fixed 
head disk with three million characters of storage 
capacity. A set of terminal scanners is used to inter- 
face to remote devices: teletypes, buffered display 
scopes, line printers, card readers, and A/D converters. 
Both memory size and peripheral storage capacity can 
be expanded in the system. In the current version, 16 
users may run simultaneously . 

All application programs in this system are written 
in a high-level interpretive language, a distant ancestor 
of which is JOSS, 1 developed at the Rand Corporation 
in 1964. It has also been influenced by related languages 
such as STRINGCOMP (developed by Bolt, Beranek 
and Newman, Inc.), and FILECOMP (specified by 
Medinet Division of General Electric Corp.). The 
MUMPS language allows the programmer to write a 
program, debug it, edit it, run it> and modify it con- 
currently during an interactive session at a console. 
The interpreter itself is a part of the executive system 
and is re-entrant. The total space taken up by the 
time-sharing monitor, the I/O monitor, buffers, and 
re-entrant interpreter is currently about 8,000 words 
of memory. The time-sharing and I/O monitors have 
been specifically tailored to work efficiently with the 
interpreter. No attempt has been made to accommodate 



Figure 1 — A schematic diagram of the core memory 
allocation of the MUMPS system and user partitions. A single 
partition is expanded to show its internal structure. The use of 
secondary storage (disk) for global data and inactive programs 
is represented. 


machine language user programs. All active users are 
assigned partitions of core memory. Activating a 
program consists of finding an available partition and 
bringing the program into it from disk; as long as it 
remains active, it occupies its partition. Core and disk 
storage allocation are depicted in Figure 1. 

The basic orientation of the language is procedural, 
much as FORTRAN and ALGOL. The largest unit 
of a program is a group of statements called a “part” 
indicated by an integer part number. A single line or 
statement of the program is a “step”; it is identified 
by a step number consisting of a decimal fraction 
appended to the part number. Multiple commands may 
be entered in a single step and executed one after 
another. A conditional statement which when evaluated 
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has a false value will, however, cause the rest of the 
commands in that step to be ignored. Commands may 
be stated in a long mnemonic form, or for the experi- 
enced user, in a much more compact form in which 
only the first letter of the command is used. A state- 
ment preceded by a step number is considered to be 
in “indirect” or “program” mode, and is stored to be 
executed as part of a program. A statement without a 
step number is in “direct” mode, which indicates that 
it is to be executed immediately after it is entered 
from the user terminal. 

Interface flexibility 

Clinical information about a patient derives from a 
variety of sources — the patient, the attending physician, 
consultants, the radiologists, the clinical laboratory, 
etc. Problems of using the computer to obtain infor- 
mation from each of these sources have begun to re- 
ceive attention. Perhaps the most widespread activity 
of this type has been the development of systems for 
clinical laboratory information processing . 2 - 3 - 4 - 6 

With the exception of laboratory data, which is 
either numeric or simple text, much of the clinical 
information in the medical record is generally recorded 
in narrative or free text form. Most investigators are 
convinced that natural language is not in general 
suitable for computer record keeping applications, 
except perhaps in certain circumscribed areas with 
limited vocabulary and syntax . 6 - 7 As a result, there is a 
significant amount of work currently being devoted to 
the development of methods for structuring this 
narrative data . 8 * 9 - 10 It is generally recognized that this 
may be best achieved by introduction of new ways of 
capturing such information, e.g., entry of data by use 
of check lists, forms, or direct user-computer dialogue. 
Interactive dialogues for the capture of narrative data 
may be based on hierarchical organization and presen- 
tation to the user of the subject material. Any particular 
topic may then be pursued to an arbitrary depth, by 
means of a succession of increasingly discriminating 
selections by the user from the options presented. A 
variety of programs for interactive acquisition of clini- 
cal data have been developed, and have generated 
needs for special terminals, display formats, and conver- 
sational languages. Conversational programs have, for 
example, been devised for the on-line acquisition of a 
patient’s medical history . 11 ' 12 Other systems aimed 
primarily at the physician have been designed for the 
purpose of entry of physical examination notes , 13 the 
recording of progress notes, or the generation of X-ray 
reports by the radiologist . 14 - 15 In the development of 
such applications, the emphasis is placed primarily on 


the interface (hardware, software, and environmental) 
of the system with the individuals who have to use it . 

As the potential of clinical data management systems 
is recognized, they will be called upon to fulfill a diver- 
sity of output functions, e.g., the display of reports or 
summaries, organized chronologically or topically, the 
production of tables or graphs. Information obtained 
by dialogue must often be translated into more precise 
medical terminology, or compacted into coded repre- 
sentations. Flexibility in output and presentation of 
information, as well as in its acquisition, is essential. 

The philosophy of MUMPS has emphasized the need 
for ease in interfacing and adapting programs to the 
requirements of the application. Programs written in 
the interpretive language do not require any compiling 
or assembling. Error comments during execution are 
typed out at the user’s console, and allow quick re- 
covery, modification of the program, and reexecution 
of it. All debugging and modification is done in the 
same language in which the program is written and 
can be done entirely from the user terminal. This 
makes modification especially convenient, particularly 
in a service environment where the trouble shooting 
necessary to interface a program with an application 
area is a time consuming process. The MUMPS environ- 
ment allows a programming session to take the form of 
a conversational dialogue between the programmer and 
the terminal device, thus minimizing the user’s time in 
programming a problem, the computer’s time needed 
in checking it out, and most important, the elapsed 
time required to obtain a final running application 
program. 

Text handling capabilities 

The complexity and variety of data that must be 
handled in a clinical information system impose a 
number of requirements on the system. A considerable 
amount of information that is input is in the form of 
text strings of variable length. The processing of input 
often requires syntax checking or limit checking. String 
comparisons, extractions, and concatenations need to 
be performed. When special driver languages or moni- 
tor subsystems are employed to control dialogues 
between the user and the computer, string processing 
capabilities are mandatory. Most existing higher level 
languages do not provide the needed combination of 
algebraic and boolean expression handling capabilities 
with the ability to handle string information. 

The MUMPS language has been designed to meet 
this need. In addition to algebraic and boolean pro- 
cessing, a MUMPS program can perform string ex- 
traction, locations, comparisons, and checking of 
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-write 9,9 


- WRITE 1 

1.10 READ ! * "UNI T NO. ",X 

1.15 IF ' X: 3N"-"2N"- "2N TYPE * ILLEGAL" GOTO 1 

- DO 1 

UNIT NO. 123- 45- 67B ILLEGAL 

UNIT NO. 12-345- 67 ILLEGAL 

UNIT NO. 123- 456-78 ILLEGAL 

UNIT NO . 123-45-67 

Figure 2 — A portion of a MUMPS program to input 
a seven digit unit number from the teletype (accomplished by 
step 1.10. The value entered is stored as. the variable named X; 
a check is made that X has the correct form, i.e., 3 digits, followed 
by a hyphen, 2 digits, a hyphen, and 2 more digits (step 1.15). 
Improper values cause an error message, and request of a new 
value. The WRITE command lists the statements. The DO 
command causes execution, which is illustrated. (In this and 
other figures, user input is underlined to distinguish it from the 
response of the computer.) 

syntax and form of information. These features are 
illustrated in Figures 2 and 3. Figure 2 shows a portion 
of a program written in MUMIfS to read a hospital 
unit number from a Teletype (i.4., entered by a user), 
to check its syntax, and to reject any improperly 
formatted responses. Figure 3 shows statements in a 
program for the clinical chemistry laboratory, which 
permit entry of a test name and its result. Checks are 
made on the legality of the test name and the reason- 
ableness of the result. Some of the interactive editing 
capabilities are shown in the figure. 

Terminal device flexibility 

An important feature of the language is its input/ 
output scheme, which permits programs to be written 
independently of the particular device for which one is 
programming. One may use any device for which the 
hardware system has been appropriately interfaced by 
merely assigning a device number to a system variable 
indicating the device to be utilized. This makes it 
possible to generate a report on a display scope, for 
example, and then to use the same program to type 
out the report on a typewriter, merely by changing, 
during execution, the value of the device number 
assigned to the input/output variable. Formatting and 
control of position on a page are made very simple by 
utilization of special format characters and variables 
indicating current position and line spacing. 

Multi-user access to a central data base 

A major requirement of a clinical data management 
system is that the information stored be accessible to a 
variety of users concurrently. Access may be from a 


9.05 SET DCT= "04. P, FHS,CH0L, TP. .>» A, K, CL. CJ 9* SG0T> L DM* VIX* *U.M# CRE" 
9.10 READ ! , "TEST : ",TF.S 

9.90 FOR 1 = 1:1:14 IF «PI ECEC DCT, I > = TES UUIT GOTO 9.0 

9.95 TYPE ’* ???" GOTO 9. 1 

9.30 ASK ! » "KESUL T= ”, RF.S GOTO 1+3 

9.40 RF.AO M PROP. ERROR. • .OK? ”,X IE ’XC'T' GOTO 9.3 
g.50 DO 100 TYPE ! GOTO 9.1 

9.10 IF RES> 1 60! RES< 190 GOTO 9.4 
9.90 GOTO 9 • 5 

-DO 9 

TEST: ??? 

TEST: j^A_ 

RESULT* 195 

TEST: 

? 9.10 1 0 1 NT 

-» 9 . 1 IF RES> 1 50! RES< 1 30 GOTO 9.4 
- DO 9 

TEST: N4_ 

RESULT* 195 PROP. ERRO K. . . OK ? _Y_ 

TEST: 

Figure 3 — A section of a MUMPS program that might 
be used in a clinical chemistry laboratory information system. 
Step 2.05 sets the variable DCT to the list of test determinations 
that are valid for this particular laboratory. Step 2.10 then accepts 
a test name from a technician. The $PIECE function in step 
2.20 then extracts substrings (between commas) from DCT and 
compares them to the variable TES whose value is the test mime 
entered. It does this repeatedly for values of I = 1,. . , , 14 until 
a match is found; at this point the iteration is terminated and 
execution continues at step 2.30. If no match is found, an error 
comment is printed (step 2.25) and step 2.10 is repeated. Step 
2.30 accepts a test result, and goes to a part in the program 
dependent on the particular value of I for which the match was 
found. 

Part 9 illustrates a specific check for results entered for the test 
name, NA (in which case 1=6). The result is compared to 
prescribed limits, in step 9.10, and if it exceeds either limit, con- 
trol goes to step 2.40. Here the user is asked to verify the value. 
The user’s response is inspected to see if it contains a “Y”, in 
which case a YES response is implied. Otherwise, a new result 
is requested, in step 2.30. If either the user verifies it, or the 
result is within limits set by step 9.10, control goes to step 2.50. 
Step 2.50 calls part 100 to file the value and then returns to step 
2 . 10 . 

The DO command causes execution, which illustrates operation 
of the program. Note that the user has interrupted the program 
from his teletype (indicated by the “? 2.10 IOINT” error com- 
ment, showing where the interrupt occurred). In this case, a 
programmer has decided to edit the program to make the limits 
for a sodium determination more stringent, by retting step 
9.10. The program is then re-executed. 

variety of terminals, by a variety of programs in the 
system, at varying frequencies. Among the possible 
purposes for accessing a file might be to report a 
laboratory result, to enter an X-ray impression, to 
record a progress note, or to enter a specific inquiry. 
Although many of these activities occur independently, 
they must share a common data base. Nevertheless, 
manipulation of the data base must occur without 
time sharing conflict, such as might occur if two users 
were to update a portion of the data base simulta- 
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neously. Without special provision, this migh tresultin 
loss of information. 

Efforts to develop specialized clinical data manage- 
ment applications are still relatively primitive. There 
have been very few concerted efforts devoted to the 
general problem of management of medical record data, 
the development of integrated patient data files, and 
the implementation of systems for long term storage 
and retrieval of this data. 16,17 Among the difficulties 
faced by the few developmental efforts that have been 
undertaken have been the lack of generality in their 
approaches, and the reliance on highly specific program- 
ming languages, file structures, and file handling 
routines. 

MUMPS provides application programs with the 
ability to create and utilize their own “local” data, 
as well as to manipulate “global” data, shared by 
other programs in the system. Local data utilized by 
a program is referenced symbolically, and space for 
it is allocated as needed. Local data is that set of vari- 
ables established within the domain of a particular 
program, and available and defined only within that 
program. The data actually resides within the user 
partition, and functions as scratch or transient data. 
Local arrays are assumed to be sparse or of varying 
dimensions, and only subscripts for which data are 
defined are allocated space. A symbolic variable used 
in a program may be given either a numerical value or 
a variable-length string value. When it has a string 
value, only that space required by the string is actually 
allocated. Thus for both strings and sparse arrays, the 
overhead of a compiler system does not exist, in which 
typically maximum sizes of arrays and maximum 
lengths for string variables must be allocated. 

This philosophy is extended to the management of 
data on the random access disk. Elements stored in 
data files are referenced entirely symbolically; the 
file name is similar to that of a local variable name in 
a program. Fields in the data file are treated as array 
elements and referenced by means of subscripts; sub- 
fields are referenced by appending additional sub- 
scripts. Data files on the disk thus comprise an external 
system of arrays, which provide a common data base 
available to all programs. The. arrays which make up 
this external system are called global variables, and 
are identified by global array names. A global name 
(or file name) consists of the character up-arrow ( ) 
followed by at least one alphabetic character. The 
form of the subscript portion of an array reference 
consists of an arbitrary number of numeric expressions 
separated by commas and enclosed by parentheses. 

To avoid time-sharing conflicts, a program may 


prevent other programs from having access to one or 
more global arrays which it is in the process of altering 
in some way, by the use of the command OPEN. The 
argument of OPEN may be one array name or a list 
of array names. OPEN prevents any other program 
from altering data in any of the specified arrays. The 
effect of OPEN is cancelled when the program ends or 
at the occurrence of the command CLOSE, which does 
not require any arguments, and releases all opened 
arrays to other users in the system. 

Hierarchical data base organization 

A most important requirement for clinical data 
management is the ability to handle the several levels 
of structure of a medical record data base, and to 
support the rather complex updating and retrieval 
needs of such a system. An example of a typical patient 
data file, such as exists in the information system under 
development at the Massachusetts General Hospital, 
is illustrated in Figure 4. This indicates the typically 
hierarchical (tree-like) structure of the data base, which 
has both a topical and a chronological organization. 
Most computer systems currently available do not have 
the ability to utilize hierarchical file organizations 
conveniently. 

The global array facility in MUMPS has been de- 
signed to meet this need. The structure of global 
arrays is hierarchical, and any node within the array 
tree may possess a numeric or string data value and/ or 
a pointer to a lower level in the tree. Data may be 
stored at any level, and there are no constraints to 
the dimension or the size of the array. In addition 
the quantity and magnitude of subscripts for an array 
are dynamic, so that not only may the content of an 
array change during usage, but also its structure may 
vary. 

Since modification of content and structure of a 
global array may be caused by a variety of programs 
in the system, a particular program must sometimes 
examine the current configuration of an array before 
attempting to access or update it. MUMPS provides a 
set of global array functions to determine the type 
and structure of a global array. These functions permit 
the programmer to locate the nodes where information 
is stored within an array, and nodes within the array 
which are empty and thus available for data storage. 

The storage of data into an array is accomplished 
solely by the assignment command, SET. Consider 
the following statement : 

SET tAPR(UN,NAME) = “JOHN DOE”, 

tAPR(UN,AGE) = 34 
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Figure 4 — A tree-structured patient data file, indicating: (1) the use of certain levels in the tree to group information, in 
specific topics, e.g., basic identifying and administrative data, review of systems, physical examination, and (2) other levels to 
group information into sets which differ by date or by some other sequencing field. 


Assume the global array name ^ APR is reserved for 
the active patient record file. Each patient in the file is 
accessed through his hospital unit number, in this 
case, a local variable UN. Both NAME and AGE are 
also local variables whose values indicate particular 
categories represented by subscripts at the second level 
of the array. This statement then assigns the string 
value “JOHN DOE” and the numeric value 34 to the 
specific second level categories, name and age respec- 
tively. Subsequently, a statement such as : 

SET t APR(UN,CHEM,N) = DATE.”, ".TEST 

might define the Nth laboratory test in the chemistry 
lab with the double field entry of the date concatenated 
(by means of the dot operator) with a comma and the 
test name. 

Retrieving data from global arrays is no different 
from retrieving data from local arrays. Both consist 
of ascertaining the value of a subscripted variable by 
using it within a numeric or string valued expression. 
The statement: 

TYPE “ THE AGE OF ”, ^ APR (UN, NAME), 
“ IS ”, t APR(UN,AGE) 

will effect the printout : 

THE AGE OF JOHN DOE IS 34 
To print out a list of a patient's laboratory tests 


(assuming / j N APR(UN,CHEM) is the total number 
of tests defined) the following statement might be 
used: 

FOR I-l:l:tAPR(UN,CHEM) TYPE 
^APR(UN,CHEM,I) 

The KILL command when applied to a specific 
node in a global array, prunes the array tree at that 
node. Any data value and/or array pointers to lower 
level nodes are removed, and that node reverts back 
to an undefined status. The statement KILL ^ APR 
(UN) would delete all information for the patient 
defined by the local variable UN. 

Included in the global array syntax is the “ naked” 
global variable. The form of the naked variable con- 
sists of the up-arrow followed by a subscript enclosed 
in parentheses. This notation is equivalent to the last 
previously used global array reference except that the 
value of the last subscript is replaced with the value 
of the subscript in the naked variable. For example, 
the statement: 

TYPE “ THE AGE OF ”,^APR(UN,NAME), 

“ IS ”, / | s (AGE) 

is equivalent to the example cited earlier. 

MUMPS requires that reference to all file infor- 
mation be done symbolically, in the syntax of hierarchi- 
cal global arrays. This replaces the classical manner 
of sequentially accessing record files on secondary 
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memory devices. Instead, an attempt is made to logi- 
cally map the content and structure of the tree-like 
data arrays into the physical storage medium of the 
system. The general technique is to map logical infor- 
mation at a specific level of an array into fixed size 
blocks chained together linearly to contain all the data 
values stored at that level, and all the pointer words 
which link it to the chains of the next lower level. The 
implementation of this design requires a careful con- 
sideration of the timing and size constraints of the 
physical device in relation to the overall system. The 
actual memory device used in the system is a large 
fixed head disk. The organization of this type of disk 
is two dimensional, wherein any physical block has a 
track and a segment coordinate. Initially a set of free 
lists are formed which chain all blocks possessing the 
same segment address together. Whenever a continua- 
tion block at the same level or a header block at a 
new level is required, the appropriate block in the free 
list whose segment address is a few segments away is 
utilized. This method makes it possible to trace down 
the many levels of a tree structure required to access 
a datum during a fraction of a disk revolution, in 
addition to the average access time of the disk unit 
required to reach the first level of the tree. As a conse- 
quence, the time required to retrieve a particular 
datum is virtually independent of the depth of sub- 
scripting required to specify the datum. Space is 
conserved by utilizing small sized physical blocks such 
that at any subscript level an average of one continu- 
ation block is required. When data is updated, care is 
given to repack and sometimes reorganize the individual 
data elements within a chain to insure maximum 
utilization of space for variable length data. Whenever 
a part of the global structure is deleted, it is passed 
to the garbage collector routines to be disassembled 
from tree-like chains back into linear chains and ap- 
pended to the appropriate free lists. This is done during 
periods of low CPU activity so as to avoid competition 
with the active programs. 

Once a block of data accommodating a single level 
of subscripting is referenced, it is maintained in core 
memory until a reference is given to a different block 
by the program. Use of the naked variable then permits 
other data at the same level to be referenced merely 
by specifying a terminal subscript, so that once a level 
is reached, often no further disk access need be made 
to manipulate associated information. If any data in 
a block is altered, it is only written back on the disk 
when a reference is made to a block other than the 
one that is in core memory, or when a CLOSE command 
is given. 


Large storage capacity 

The conversational environment in which a clinica 1 
data management system is designed to operate de- 
mands little computer processing power. When data is 
entered, a program need only check on its legality, 
decide where to file it, and select an appropriate re- 
sponse to the user. Generation of reports may involve 
manipulation of information from peripheral storage 
to assemble the data needed, but only a small amount 
of processing to actually format or produce the report. 
Large volumes of data need to be available for low 
level, low frequency usage. Thus one does not need 
computing power as much as the availability of pe- 
ripheral storage of large capacity. Much of the data 
may be potentially accessed at any time, and therefore 
need to be stored on a random access device. Because 
of the large quantities of data that may be anticipated 
in such systems, it is necessary to provide hierarchies 
of peripheral storage, in which the access time of the 
storage device used is commensurate with the fre- 
quency or urgency of the need for retrieval. 

In MUMPS the fixed head disk provides fast random 
access storage, whereas slower access requirements are 
currently met by three Dec tape units. A large movable 
head disk unit is being installed to permit intermediate 
access times for other data. 

Efficient Time Sharing 

In a conversational data management system, 
programs spend much of their time in an input/ output 
hung status, i.e., doing disk activity or completing a 
transaction at a terminal. As a result, there is again 
not a large demand by a program for the central 
processor. In contrast to most numerical applications 
where central processing power is the limiting factor, 
in a conversational environment the time necessary to 
complete a task is often determined by the speed of 
the input/output equipment or the human response 
time at a terminal. As a consequence of the small 
demand for the central processor by an individual 
program, one can theoretically time share a large 
number of programs. Efficiency of the use of the central 
processor is in this situation determined by how rapidly 
the time-sharing monitor can change from one user 
to another. This swapping overhead is the delay before 
a particular user program can run after a previous 
user has quit the run state, due to an input/output 
hang, expiration of time slice, or termination of its 
task. When the central processor is not being fully 
utilized, swapping overhead tends to determine re- 
sponse time of the system. 



304 Fall Joint Computer Conference, 1969 


TABLE I — A comparison of execution times for various numeric processing 
examples in MUMPS and FORTRAN 


CPU Time (Microseconds) 


Statement, 

MUMPS 

FORTRAN 

MUMPS/FORTRAN 

RATIO 

FOR/DO 

(Iteration, per cycle) 

250 

12* 

20.8 

1 + 2 

800 

7* 

114.3 

2*3 

850 

44 

19.3 

1 + 2*3 

1050 

48 

21.9 

1 + 2 - 3*4/5 

1550 

120 

12.9 


* These are the only operations compiled by the PDP-9 FORTRAN Compiler as in-line code. All other operations 
beside integer addition (in DO loops and arithmetic expressions) are compiled as subroutine calls. 


In the MUMPS system, the use of a partitioned 
memory has been dictated by the! overwhelming con- 
cern for response time. As a result! of partitioning, the 
time sharing monitor can switch between users in 
minimum time without having to resort to swapping 
of programs in from a drum or disk. In addition, the 
monitor automatically overlays external program 
segments invoked by an active program. Proper link- 
ages are set up to return automatically to the invoking 
program when execution of a segment terminates. 

Execution speed of an interpretive program doing 
pure numneric processing may be slower by a factor of 
about 20 to 1 over corresponding code generated in a 
compiler or assembly language system. 

Table I illustrates some timing comparisons between 
a single user version of the MUMPS interpreter and 
the manufacturer-supplied FORTRAN compiler for 
this computer, for statements involving pure numeric 
processing activity of varying complexity. As has been 
indicated above, however, few programs do pure nu- 
meric processing in a clinical data management environ- 
ment. Input/output conversion in FORTRAN and 
most other compiler systems is handled in a purely 
interpretive fashion, and thus, for this activity, very 
little difference in the performance between the two 
kinds of systems may be expected. Furthermore, a 
significant part of the processing done by programs in 
clinical data management systems involves file manipu- 
lation, or text string processing activities; in all as- 
sembly or compiler language systems these functions 
are usually handled by the use of subroutines. There- 
fore, the employment of an interpreter as a means of 
generating calls to these subroutines rather than com- 
piling the calls themselves requires only a small amount 
of processing overhead. 


The foregoing observations refer to comparisons 
between execution speeds of MUMPS interpretive 
language statements and compiler-generated object 
code on a single-user computer, with no other pro- 
cesses competing for the processor. More significantly, 
in a data management environment, a re-entrant 
interpreter such as MUMPS may provide the most 
economical means of achieving a highly responsive 
time-shared information system. In the MUMPS 
system with sixteen typical users active, response 
times (a most sensitive measure of efficiency in a time- 
sharing system) are always less than a second and 
usually appear instantaneous. 

There are several reasons that account for this, all 
of which are related to very efficient use of core storage. 
First, a typical program written in the interpretive 
language takes up 10 percent to 20 percent of the 
space taken up by the object code generated for a 
similar program written in a compiler language. Also, 
dynamic allocation of data and efficient storage of 
variable length strings and of sparse arrays are standard 
features of the interpreter. Thus data also take up 
considerably less space in this kind of environment. 
In addition, since the interpreter is re-entrant, ail 
programs may share the same utility routines and 
operating system capabilities. This contrasts rather 
sharply with conventional compiler language operating 
systems, in which each running program must have 
its own copy of the necessary system routines that it 
will utilize. 

The significant advantage that results from the above 
features is that programs take up much less space; 
therefore, a partitioned memory system on a medium 
or small scale computer becomes feasible. Active 
programs are typically highly interactive, and are 
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therefore doing only small amounts of processing 
between input/output requests. Therefore the time- 
sharing monitor is invoked frequently to pass control 
from one user to another, in order to utilize the central 
processor as much as possible. In a partitioned system, 
swapping of the users is very rapid. In systems that 
use various schemes for submerging disk or drum 
swapping, users that are running in a conversational 
mode often do not stay in the run state long enough 
to submerge the concurrent swapping process. There- 
fore potential CPU time is unavailable; this unused 
time may be on the order of 20 to 50 percent of the 
total amount available. The speed that results from 
not using disk or drum swapping appears, in our 
experience, to more than offset the overhead of interpre- 
tation, with greatly increased efficiency in the utili- 
zation of space. 

CONCLUSION 

The convenience occasioned by the utilization of a 
high level language with symbolic referencing capa- 
bility for data stored in complex tree structures on 
peripheral storage has greatly simplified the develop- 
ment of application programs for clinical data manage- 
ment. This is the only system that we know of, on a 
computer of medium or small scale, which supports 
such extensive file manipulation, string handling, and 
input/output flexibility. It is the only system we have 
encountered on any computer which allows all these 
manipulations to occur entirely in a high level language. 
This system has been used at the MGH for all of our 
programming research and development activities. 
Equally important, because of its compactness and 
efficiency in this environment, we use it for the imple- 
mentation of our service programs, including a chemis- 
try laboratory reporting system, 18 a patient history 
taking system, and a number of programs for physician 
entry of narrative record information. 

An advantage of this approach to clinical data 
management over the use of a large commercially 
available general purpose time-sharing computer- 
with its complex operating system has been the in- 
creased flexibility that is possible with a specially 
designed system. This increased flexibility results 
because the system has been built to meet specific 
objectives, in contrast to having been implemented 
within the often arbitrary and inefficient constraints 
of a general-purpose time-sharing facility. In addition, 
with a special purpose system, it is possible to achieve 
the efficiency required for service operation with a 


computer whose size and cost are well matched to the 
requirements of the problem area. 
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Medical education — A challenge for 
natural language analysis, artificial 
intelligence, and interactive graphics 


by J. C. WEBER and W. D. HAGAMEN 

Cornell University Medical College 
New York, New York 


INTRODUCTION 

In a functional sense, Computer Assisted Instruction 
(CAI) has not advanced from the primary grades, yet 
its implications for higher education cannot be ignored. 
Most of the work that has been done in CAI falls into 
the category of drill and practice or straight tutorial 
presentation. Logically, both the hardware and soft- 
ware that have been developed or modified to support 
CAI have been tailored with these goals in mind. In 
medical education, multiple choice questions would 
neither hold the interest of the average student nor 
challenge his intellectual abilities. Since we can formally 
present only a small fraction of the problems our 
students may some day have to deal with, we are con- 
cerned not only with presenting factual information, 
but even more with developing their power to reason 
and handle new problems. Medical students have widely 
divergent backgrounds and needs, as well as differing 
interests. For these and other reasons, we need a truly 
two-way, free-format discussion where each student is 
treated as an individual. Anatomy, the field in which 
we teach, is very much a visual science. Consequently, 
graphic capabilities are important. Here also the student 
needs to interact and be treated as an individual. 

It should be pointed out that we are computer naive 
people who have been working without professional 
help. We have been using a system and a language 
which nicely meet the requirements for which they 
were designed, but in approaching the needs of higher 
education, programming becomes laborious and cir- 


cuitous. We are well aware that others working with 
more sophisticated systems have produced more so- 
phisticated results. Indeed, to many our methods may 
seem primitive. However, our challenge has been to 
implement natural language analysis, self-adaptive 
programming, and interactive graphics within a frame- 
work of restricted costs. It is important that people in 
the computer field be made aware of the systems and 
language requirements of people in various areas of 
education. For CAI ever to become a reality, it must 
first become an interdisciplinary endeavor. 

The system and language 

Our work has been centered around the IBM 1500 
Instructional System and its associated language — 
COURSEWRITER II. The 1500 is supported on 1130 
(32 K) hardware. Peripheral equipment consists of 32 
terminals, each with a cathode ray tube (CRT), a 128 
character keyboard input and a light pen, a typewriter 
unit, and a 16 mm random access image projector. 

COURSEWRITER II is an interpretive, non- 
computational language. Both COURSEWRITER II 
and the 1500 Instructional System are described in 
detail in IBM publications. 

Natural language analysis 

Our basic format is schematically illustrated in 
Figure 1. There are two different types of discussions. 
The large circle represents an anatomical discussion 
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Figure 1 — A schematic representation of the modular 
unit. The large circle represents an anatomical 
discussion, the smaller satellites represent 
clinical problems. A large number of these 
modular units are interconnected to 
form a course segment or topic of 
discussion 


and the smaller satellites clustered around it represent 
what we call clinical problems. The cluster of clinical 
problems surrounding each anatomical unit is directly 
related to that block of anatomical material. We try 
to have a ratio of at least ten clinical problems to each 
anatomical discussion. Thus the organization is modular 
and any number of these modules may be linked to- 
gether to form a course segment or topic of discussion. 
At the present time we try to keep: these course segments 
small enough that the average student can complete 
thqm in 30-60 minutes. 

To facilitate the description we shall consider a 
discussion of the extrinsic muscles of the eye and their 
nerve supply. There are seven such muscles and they 
are supplied by three nerves. This course segment 
consists of 13 modules, i.e., 13 anatomical discussions 
and their associated clinical problems. 

A student signs on a terminal for a particular course 
segment, i.e., he chooses the general topic he wants to 
discuss. He is then presented with a choice : 

DO YOU WANT TO BEGIN BY ASKING 
QUESTIONS? (SQ) 


OR 

DO YOU WANT ME TO INITIATE THE 
DISCUSSION? (CP) 

If he indicates that he wants to ask a question, he is 
branched to a subroutine which handles the analysis 
of student questions (SQ) . If he indicates he wants the 
computer to initiate the discussion, he is branched to 
one of the clinical problems (CP) in one of the clusters. 

Clinical problems 

On the initial branch, i.e., if the student elects to 
have the computer initiate the discussion, both the 
cluster and the specific clinical problem in the cluster 
are randomly selected. Each clinical problem is a rela- 
tively brief linear presentation, i.e., three or four state- 
ments, each illustrated by a picture (with the film 
strip projector), followed by one key anatomical 
question. For example, after describing and illustrating 
a patient’s signs, symptoms, and history, the student 
might be asked : 

WHICH MUSCLE IS INVOLVED? 

If he answered this question correctly, he would branch 
to another clinical problem in another cluster or module. 
The student is taken from cluster to cluster in a pre- 
scribed sequence. However, the specific clinical problem 
in each cluster is randomly, but non-repetitively 
selected. As long as he continues to respond appropri- 
ately, he branches from one clinical problem to another 
without ever entering into the underlying anatomical 
discussions. Each time he successfully completes a 
clinical problem, a scoring counter is incremented by 
one. If he were to progress through six of these clinical 
problems, he would have been examined on three of 
the seven muscles, and three of the seven branches of 
the three nerves supplying them. Since the general 
principles of function and methods of testing one 
muscle or nerve are similar to those underlying the 
others, it is our judgment that a student who success- 
fully completes six successive clinical problems correctly, 
in this predetermined sequence, has demonstrated 
mastery of this block of subject matter, and he is told 
so. He may then either sign off this course segment or 
continue in it as long as he desires. It is possible for 
someone to sign on, complete six successive clinical 
problems, and be finished in as little as two minutes. 
(The value that we require in this scoring counter to 
demonstrate mastery is dependent on the length, 
complexity, and nature of The material discussed.) 
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However, if he misses the one key question in any 
clinical discussion, his scoring counter is set to zero 
and he is branched to the corresponding anatomical 
discussion. 

Anatomical discussions 

The anatomical discussions differ from the clinical 
problems in several important respects : 

(1) . They are highly branched. For some questions 
there are as many as 35 anticipated answers with up 
to ten different branches, depending on which anwser 
is given. It does not require many such nodal points 
to produce a highly complex network. It is possible 
for a student to stay in a single anatomical discussion 
for 30-40 minutes without retracing his steps. How- 
ever, it is unlikely for him to have to do so, since 
hopefully he is learning at every decision point. 

(2) . For each anatomical discussion there usually is 
only one starting point, and one logical exit point. 
Despite the complexity implied above, the entrance 
and exit points may be adjacent to each other, i.e., 
it is possible to come in, answer two questions, and 
be out. In practice this seldom happens, since some 
subset of the question that permits him to get out 
is included in the clinical problem which sent him 
into the anatomical discussion. We simply are fol- 
lowing the well known pedagogical axiom that one 
can only hope to get across one or two major points 
in a discussion. Some individuals can appreciate these 
general principles in their barest form, while others 
need elaboration. 

Let us illustrate this with one example. The student 
misses a clinical problem and enters an anatomical 
discussion. The first question he is asked may be: 

WHAT IS THE ACTION OF THE RIGHT 
SUPERIOR RECTUS MUSCLE? 

The correct answer, assuming the patient is looking 
straight ahead to start with is : 

IT MOVES THE EYE SUPERIORLY, MEDI- 
ALLY, AND ROTATES IT IN A CLOCKWISE 
DIRECTION AS YOU FACE THE PATIENT. 

This may sound like a fairly difficult question and 
certainly we obtain a variety of answers. However, the 
student is shown how to reason out the answer by a 
series of leading questions and explanatory pictures. 
The question that follows the correct answer to the 


first question, and the one he has to answer to get out 
of this part of the discussion in essence is: “What would 
you do with this knowledge?” More specifically he is 
asked: 

WHAT WOULD YOU ASK A PATIENT TO DO 
THAT WOULD TEST THE ACTION OF THE 
RIGHT SUPERIOR RECTUS AND ONLY 
THIS MUSCLE? 

Here is where our challenge lies — to teach the student 
to question the validity and significance of facts — to 
train him to reason. What good is it that a physician 
know the action of a muscle if he cannot utilize this 
knowledge by testing the muscle in his patient? 

(3) . If the student entered the anatomical discussion 
via a clinical problem and reaches this normal exit 
point, i.e., has answered the above question correctly, 
he will branch to the next clinical problem and once 
again try to answer six in a row correctly. 

(4) . At any point in an anatomical discussion, but at 
no point in a clinical problem, the student may ask 
any question he wants. He is then branched to a sub- 
routine which analyzes student questions. 

Student questions 

At any point in an anatomical discussion when he is 
asked a question he may choose not to answer it, but 
rather to ask a question of his own. His motivation may 
be that he thinks his own question will lead him to the 
answer he is lacking, or he may in effect be saying: 
“Okay, I've had enough of this particular line of con- 
versation, let’s proceed to something I don’t already 
understand.” Whenever he asks a question three 
things are permanently recorded : his name, his 
question, and where in the program he asked the 
question. The address of the question he avoided 
answering is stored so he may be returned to this 
point. 

His question is first prescanned (key letter analysis) 
in order to determine whether it is germane. If not, 
he is told so and returned immediately to the question 
he avoided answering. However, if his question is 
germane, it is further analyzed and he is branched to 
some other point in that anatomical discussion or into 
another module, where he is shown how to reason out 
the answer to his question. We prefer this to giving 
direct answers to his questions. If he is branched to a 
place where his question is answered immediately, the 
reasoning behind this answer and a probing analysis 
follow. 
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Once the student is in the question asking routine, 
and after his question has been answered, he has several 
options open to him. (1) He may continue to ask as 
many questions as he likes, thus branching from point 
to point within a given anatomical module or, more 
commonly, branching from one anatomical discussion 
to another. (2) He may signal the computer at any 
time that he is ready to return to the point where he 
asked his initial question. (3) He may, without knowing 
it, reach the normal exit point of an anatomical dis- 
cussion. However, since he is in the question asking 
mode, he is treated differently than if he had entered 
via a clinical problem. Instead of being branched to 
another clinical problem, he is returned to the point 
where he asked his initial question. Thus there is no 
way he can avoid the question he originally chose not 
to answer. 

Remember that when he first signed on the course 
segment he was given an option as to whether he wanted 
to ask a question or whether he| wanted the computer 
to initiate the discussion. If he chose to ask a question 
at that time, he would have entered exactly the same 
subroutine. He would have been handled in the same 
manner with the following mihor exceptions. If he 
reached the normal exit point of an anatomical dis- 
cussion, he would be branched back to his starting 
point and given the option again. If he signalled the 
computer that he had tired of asking questions, he 
would in essence be saying that hie wanted the computer 
to take the initiative and would then be branched to a 
randomly selected clinical problem. 

There are several distinct advantages to the experi- 
mental format currently being used by our students. 

(1) . Authoring is greatly facilitated by the use of 
modular units for course construction. It is one thing 
to sit down and write a lecture or linear presentation, 
but quite another to outline a highly branched, open- 
ended discussion. The smaller the modules, the easier 
this is to perform. 

(2) . Relevance and interest are maintained through 
the “clinical problem” approach to human anatomy. 
The clinical problems, however, are just one type of 
application question which is common to many disci- 
plines. They provide a certain amount of interest or 
spice to the learning of what otherwise might appear 
to be a series of facts or skills which often seem irrele- 
vant. The question of “relevance” is even more im- 
portant than providing interest. There is more to 
learn than we have time to teach and sometimes we, 
as teachers, tend to get carried away by details that 
happen to have special interest for us. Thus the appli- 


cation questions help to keep us “honest” and relevant. 
If a piece of anatomical knowledge cannot be accessed 
via a clinical discussion, perhaps we, should question 
its significance to the student. 

(3) . The ability of the student to ask free format 
questions and be shown how to reason out the answers, 
gives him the feeling of being treated as an individual. 
He can literally chart his own path through a dis- 
cussion until he is ready to be evaluated, i.e., to enter 
the clinical problems. Teaching the ability to ask 
questions and to reason out the answers is one of the 
most difficult tasks we face as teachers. 

(4) . The high ratio of clinical problems to anatomical 
discussions, the redundancy and highly branched nature 
of the anatomical discussions themselves, and the 
ability of the student to ask free format questions, all 
contribute in permitting students taking the same 
course segment to have relatively unique experiences. 
Not only do they get different clinical problems, but 
they may not even be taken to the same anatomical 
discussions. We find that this variety of experience 
inside the classroom stimulates discussion outside the 
classroom. 

(5) . It is the combination of the features discussed 
above that permits one student to be told he has 
mastered the material in one course segment in as 
little as two minutes, while another student may spend 
several hours to attain tjie same degree of mastery. 
This raises the interesting implication of informing a 
student when he has attained sufficient mastery of 
the entire subject matter, rather than giving him a 
course grade. Some students, either because of ability 
or previous experience, might achieve this level of 
mastery in a month, while another student might 
require the present six months. The faster students 
would have a lot of free time which could be spent on 
other courses, independent study, electives, or re- 
search. Thus it is conceivable that once a curriculum 
were implemented on the computer, the student’s 
medical college transcript might more meaningfully 
consist of a record of how many things he accomplished 
during his training, rather than a series of numerical 
grades. 

Artificial intelligence 

This may be a rather grandiose term for the rather 
primitive examples we have, but we want to discuss 
two general topics, i.e., improving the methods by 
which we handle the student questions, and developing 
self-adaptive programs. 
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Student questions 

As a result of experience with students on the initial 
course segments, we found that a large percentage of 
the questions they asked either were not answered or 
were not handled appropriately. This does not have to 
happen very often to discourage a student from asking 
any further questions. However, we recorded every 
question a student asked, so we were able to review 
them. We found there were three main reasons for 
mismatches on the questions: (1) the question was not 
related to the subject matter being discussed, (2) the 
student did not provide enough information, and (3) 
he provided too much information. 

(1). If the question is not pertinent to the subject 
being discussed, we have no need to answer it. This 
was determined by prescanning for keywords. We 
found, however, that we could not always tell whether 
the question was not pertinent, or whether it was just 
not specific enough. Basically we solved this by equating 
certain synonymous terms and by adding to the number 
of keywords in the prescan. We also added two other 
levels of scanning. The first is for such things as leg, 
arm, thorax, etc., which are parts of the body far 
removed from the eye. If these are detected, the student 
is told, for example : 

WE ARE NOT DISCUSSING THE LEG AT 
THIS TIME. PLEASE LIMIT YOUR QUES- 
TIONS TO THE SUBJECT UNDER DIS- 
CUSSION. 

We have a second level which includes keywords re- 
lated to the region, but not to the subject. Thus if 
his question referred to the maxillary nerve, part of 
which does run through the orbit, he would be told : 

THE MAXILLARY NERVE IS RELATED TO 
THE ORBIT, BUT DOES NOT INNERVATE 
ANY OCULAR MUSCLES. 

If he did not match on any of these three levels of 
prescanning, he would simply be told: 

YOUR QUESTION DOES NOT APPEAR TO 
BE WITHIN THE SCOPE OF OUR DIS- 
CUSSION. DO YOU WANT TO REPHRASE 
IT? 

If he does not choose to rephrase it, he is branched 
back to where he asked the question. Differentiating 
whether his question is not germane or whether it is 


not specific enough is almost essential. Trying to de- 
termine in what wav it is not related simply makes 
the dialogue a little more personal and gives the 
student the feeling he is being treated as an individual. 

(2) . The most common difficulty was that the student 
did not supply us with enough keywords, i.e., his 
question was not specific enough. Thus we have de- 
veloped a little subroutine which helps him make his 
question more specific. For example, if the only key- 
word we detect is MUSCLE, we ask him: 

WHICH MUSCLE OF THE EYE AND WHAT 
DO YOU WANT TO KNOW ABOUT IT? 

He then is given the chance to rephrase his question. 
Thus with relatively little programming we can inter- 
act with the student in a conversational manner until 
his question is understood. On the basis of previous 
experience we feel we will be able to handle most of 
the questions asked. 

(3) . The third area where we sometimes had difficulty 
was when the student provided us with too many 
keywords. It is a surprising fact that the number of 
keywords required in a given course segment to provide 
us with enough information to answer a question is 
remarkably constant. In the program on the muscles 
of the eye it was three. When there were too many 
keywords, analysis showed he was usually asking more 
than one question, or at least what he thought was 
a single question could be broken down into two 
smaller ones. Less frequently he was simply being too 
verbose. Formerly he would branch on the basis of 
the first three words that matched, but this was not 
always appropriate. Now we count and store the 
number of keywords in his question. If this exceeds 
our magic number, in this case three, the words we 
have detected can be displayed for him on the screen. 
He then is asked to rephrase his question using no 
more than three of these words, or to ask only one 
question at a time. 

Self-adaptive programming 

We would like the program to modify itself on the 
basis of experience, much as a teacher learns from his 
experience with students. As a result of our own re- 
search in neurophysiology, we feel that two basic 
aspects of learning are: (A) an increase in seeking or 
exploratory behavior following cessation of a rewarding 
stimulus, 1,2 and (B) habituation or the dropping out 
of unrewarded components of a response. 3 A teacher, 
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at least a good teacher, when challenged is ready to 
increase the variety of his response. This is an example 
of exploratory behavior. He may do this by retrying 
responses that were previously part of his repertoire, 
but had been temporarily discarded, i.e., had undergone 
habituation. He may also increase his repertoire of 
response by incorporating responses acquired from 
experience with students. At the present time we have 
only begun to incorporate these learning concepts into 
instructional programs. 

The following are examples of capabilities we consider 
necessary for the computer if it is to approach the 
versatility required in tutorial discussions. The first 
two exist only as isolated demonstrations at selected 
points, because COURSEWRITER II does not permit 
us the computational ability to do this on a large 
scale. The third example, which we consider of utmost 
importance, has not actually been implemented as yet, 
but we foresee no major obstacles; except for the limited 
computational capacity of the system. 

(1) . If a certain percentage of the students (currently 
20 percent) all ask the same question at the same point 
in the program, subsequent students are branched as 
though they had asked the same question. They are 
treated as though they were iii the question asking 
mode, e.g., when they reach the normal exit point of 
an anatomical discussion, they are returned to the 
point where they came from. Thi^ branching is dynamic 
and reversible in the sense that the need for asking the 
question is constantly evaluated. Thus if two students 
in any series of ten ask the sam$ question at the same 
point, every odd-numbered stpdent that follows is 
branched as though he had asked the question. Even- 
numbered students are not branched. If nine of the 
next ten even-numbered students fail to ask the ques- 
tion, the branch is deleted. However, if two or more of 
them do, then the branch is reinforced, i.e., three out of 
every four successive students will be branched. 
Certainly if a significant number of students did 
ask the same question at the same point in a dis- 
cussion, we as teachers would probably modify our 
approach. How often this will occur and whether the 
percentage should be greater or less than 20 percent 
are questions we cannot answer until we can test it on 
a larger scale. 

(2) . There often are several places in a program to 
which we could branch a student in response to his 
question. At present we make this choice for the 
students. We plan to give them some degree of control 
by forming a hierarchy of possible branch points. 
Originally these will be evaluated by us as first, second, 


third, or fourth choices. However, each time a student 
is branched and reaches the point where we think his 
question should have been answered, he will in effect 
be asked: “Okay?” or “Does that answer your ques- 
tion?” If he says yes, the likelihood of that branch 
will be augmented. If he says no, he will be branched 
to another point and the likelihood of the original 
branch will be decremented.. Thus what we thought 
was the least plausible response to a given question 
may be shown to be the most desirable on the basis of 
experience with students, and it will achieve the status 
of the initial branch without any manual interference 
by the author. 

(3) . One of the most significant ways a teacher learns 
from experience with his students concerns the unan- 
ticipated but appropriate answer. Right now we record 
all unanticipated answers and review them periodically. 
Occasionally an unanticipated answer proves to be 
more perceptive than the anticipated answers the 
author programmed. At present such a student is 
treated as though he were wrong. 

When a student gives an unanticipated answer and 
feels he is treated in an inappropriate manner, why not 
permit him the option of repeating his answer and 
treating it much as we would a question? Essentially 
he would be entering a “debate mode”. We feel that 
our question answering routine is sufficiently flexible 
now that he would eventually be taken to a point 
where he could decide whether his original answer was 
valid or not. If it proved invalid, he would be branched 
back and his pathway erased. However, if he felt he 
had won his point, then his route could be preserved. 
This would then become an anticipated answer for 
subsequent students. In interpersonal discussions our 
students often challenge us and not infrequently they 
win their point. However, even if this occurred only 
once in a thousand times, these are the type of re- 
sponses we would least like to discourage. How can we 
profess to encourage our students to question and 
reason and then give an inflexible response? This is a 
level where computers are not presently competitive 
with a human tutor. 

Since we have not yet implemented this, and do not 
want to be considered idle dreamers, we shall elaborate 
on how we intend to program this type of ability. 
First it should be made clear that we are not talking 
about situations where the student’s response involves 
evidence not available in the program. We are talking 
about situations where he reasons from one logical 
statement to another. Let us cite a specific example. 
In our original version of the discussion of the eye, we 
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programmed many anticipated answers to the question : 

WHAT IS THE ACTION OF THE SUPERIOR 
RECTUS? 

One answer we did not program was : 

THAT DEPENDS ON THE STARTING 
POSITION OF THE EYE. 

We subsequently modified the program to include this 
as an anticipated answer. However, the inherent logic 
was already present for the student to have won his 
point. If he had asked — in debate mode : 

WHAT IS ITS ACTION IF THE PATIENT 
STARTS BY LOOKING MEDIALLY? 

he would have been given one answer. If he then asked : 

WHAT WOULD ITS ACTION BE IF THE 
PATIENT STARTS BY LOOKING LATERAL- 
LY? 

he would have been given a very different answer. 
Clearly this would prove that the action depends on 
the starting position of the eye. 

The computer has no such ability to reason, but the 
student does. Thus we are permitting him to make 
value judgment. He could signal the computer that 
these two answers made his point and subsequent 
students would then branch there, rather than along 
the path previously followed. Since we are permitting 
the student to make a value judgment that affects the 
subsequent course of his fellows, the process must be 
reversible. Thus the next ten students who gave the 
same answer would be asked by the program whether 
they understood the line of reasoning that followed. If 
the consensus were yes, then the branch would remain; 
if it were no, then the branch would be deleted. 

Interactive graphics 

Gross anatomy is very largely a visual science. 
Knowing the three dimensional relationship of one 
structure to another is a fundamental basis for clinical 
diagnosis. The best way to organize this information 
is with pictures, so our students are encouraged to 
spend a lot of their time sketching. In our linear (non- 
computer) programmed teaching they can actually sit 
and copy pictures that are projected. The question 
then arises, does the computer offer interactive graphic 


capacities that are competitive? In order to explain 
what we have done and our problems in this area, it 
will be necessary to go into some of the details of the 
system with which we work, since it is quite different 
from what most people think of when they speak of 
CRT graphics. 

The 1510, which is the CRT, light pen and keyboard 
unit, was designed primarily for the rapid display of 
text, and its designers assumed that its graphic appli- 
cations would be limited. The usable area on the face 
of the CRT is 4^ X 8 inches. It may be thought of as 
a grid consisting of 32 rows and 40 columns (Figure 2). 
A standard alphanumeric character would occupy two 
of these boxes, i.e., two rows by one column. Each box 
on this grid, i.e., each one row by one column unit, 
may be thought of as a matrix of 48 potential dots of 
light, six dots high and eight dots wide (Figure 3). 
Thus the entire screen consists of a maximum of 61,440 
dots (192 vertical X 320 horizontal). Actually these are 
more accurately described as horizontal slashes; the 
dots are wider than they are high as may be seen in 
Figure 4. This is a significant factor which must be 
considered in preparing the drawings, to prevent dis- 
tortion. 

The system provides a standard character dictionary 
and the user may define additional graphic sets. These 
graphic characters, as defined by the system, occupy 
such a large part of the screen that the likelihood of 
being able to use the same graphic character to con- 
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Figure 2 • — This shows the organization of the screen into 32 rows 
and 40 columns forming 1280 addressable units. The standard 
alphameric characters occupy two of these boxes, i.e., two rows 
by one column 
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Figure 3 — The 48 dot matrix defined by the intersection 
of one row and one column 



Figure 4 — A CRT display of the skull and mandible 
from the side 


struct more than one picture is almost nil. It is anal- 
ogous to taking a printed page, dividing it into four 
quadrants and saying you can use these quadrant units 
to write anything else you desire, as long as you don’t 


change any of the letters or words. It quickly became 
obvious that we did not want graphic characters of 
the type just defined, but rather we needed a graphic 
alphabet. Just as in the case of the English language, 
given the 26 letters of the alphabet, one can write any- 
thing he likes, so given the means to directly access 
each of the 48 dots in each box, we could draw any 
pictures we desired. 

That is basically what we did ; we defined a character 
dictionary with each character being a single dot 
(Figure 5). Thus with what amounts to little more than 
1/3 of one character dictionary area we can draw as 
many pictures as we desire. The backspace function 
permits superimposition of characters. Thus if our 
display instruction were to contain the following charac- 
ters, as defined in our graphic alphabet, i.e., BCDEF- 
JNRVbfjklmn, and there w r ere a backspace command 
between each of them except the “m” and the “n , ” 
we would get the dot pattern shown in Figure 6. Notice 
that omission of the backspace instruction caused the 
“n” dot to appear in the six by eight box one column 
to the right. 

We always limit our display instructions to one row 
at a time and we put as many instructions on each row 
as possible, i.e., w r e try to break our pictures up into 
the smallest units we can. This permits us greater 
freedom with the input buffer (250 character limit), 
facilitates debugging, allows us to modify pictures with 
a maximum of ease, provides animation capacity, and 
is especially useful w 7 hen w 7 e give the student the ca- 
pacity to draw r his ow T n pictures. However, there is at 
present one very serious limitation to putting multiple 



Figure 5 — The characters used to define our graphic 
alphabet. The character plus its case determines 
the position of the dot within the matrix 
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Figure 6 — The pattern produced by the following coding 
< B %C %D %E %F % J %N %R %V > %B %F % J %K %L %M N 
The “<” defines subseqnent characters as upper 
case. The “>” defines subsequent characters 
as lower case. The “%” is the code for 
the backspace function 


display instructions on a single row. No erasure occurs 
between adjacent rows, i.e., the upper and lower limits 
of each box are inviolate. However, erasure does occur 
between adjacent columns. Let us assume that we had 
two adjacent boxes on the same row filled completely 
with dots [Figure 7 (A)]. If subsequently any pattern 
were displayed on the same row in the column just to 
the left of this, e.g., a vertical line in the extreme left 
of the box, we would get the pattern shown in Figure 
7 (B). A subsequent display instruction on the same 
row but in the column just to the right of our original 
display, e.g., three dots vertically arranged in the ex- 
treme right of the box, we would get the pattern shown 
in Figure 7 (C) . Thus a display insert command erases 
five dot columns to the right of the insert and three 
dot columns to the left on the ; ame row. We have been 
told that this can be improved on a hardware level so 





Figure 7 — This shows the problem of erasure with multiple dis- 
play inserts on the same row. The insertion of the single dot 
column in (B) causes erasure of 5 dot columns from the original 
pattern in (A). Insertion of the 1/2 dot column in (G) causes 
erasure of 3 dot column from the original pattern 


no erasure will occur. This would be of utmost im- 
portance to anyone who wants to exploit the graphic 
capacities, especially in having the student draw on 
the CRT. 

The resolution of the light pen is limited to one box 
as defined by one row, one column. Light detected from 
one box can be differentiated from light in any other 
such box. Two lighted dots are required to produce a 
detect and these two dots must be separated by one 
dot row. Thus the pattern's shown in Figure 8 (A) would 
all permit detection ; those in Figure 8 (B) would not. 

We use the light pen as a pointer. We have not been 
able to devise any means of using it as a stylus, although 
we do have various ways in which w r e can have the 
student draw on the CRT. Some of the ways we use 
CRT graphics are enumerated and briefly described. 

Identification 

In the CRT display shown in Figure 9, we have the 
student use the light pen to identify the structures 
labeled in Figure 10. We feel that since we are dealing 
with a picture approximately four inches in height, 
this is pretty good resolution. As with verbal questions, 
we branch selectively not only according to whether 
he is right or w r rong, but also on the basis of what the 
nature of his error is. Thus his thinking is analyzed 
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Figure 8 — (A) shows three dot patterns which permit light pen 
detection; (B) shows three dot patterns which would not be de- 
tected by the light pen 


Anterior palatine foramen 



Infra-orbital foramen 
Greater palatine 
Infra-orbital fissun 

Hamulus 
Foramen laceru 
Foramen ovale 
Foramen spinosu 

Spinous process 

Petrotympanic 
fissure 

Hypoglossal canal 
Carotid canal 
Styloid process 
Stylomastoid foramen 

Jugular foramen 


and he is led by discussion or demonstration to the 
correct answer. Since the face of the CRT is behind a 
glass cover, we have a parallax problem. The boxes 
that he is trying to define measure only 1/6 X 1/6 
inches. We cannot vary the intensity of the beam by 
tracking the pen. We can require a double detect, i.e., 



Figure 9 — A CRT display of the base of the skull 


Figure 10 — This shows some of the structures that we 
require the student to identify using the light pen 
on the graphic display shown in Figure 9 


on the first detect temporarily erase the adjacent boxes 
and ask in effect: “Is this what you want to point to?” 
However, in practice we do not find this necessary. 
After a little experience the students make very few 
parallax errors. 

Animation 

We use a few examples of animation in the usual 
sense such as moving the eyes, swallowing, etc., which 
can be done in the insert mode. This is quite effective 
as long as only part of the picture has to be regenerated. 
More commonly we employ animation in the sense of 
drawing something slowly for purposes of emphasis. 
For example, when we ask a student to point to where 
a nerve originates, after he does so correctly, we may 
respond by having the nerve “grow” out along its 
course. 

Enlargement 

The 1510 has no vector or scaling capacities. How- 
ever, we do present a small scale view of a structure 
such as the skull and then enlarge certain parts of it 
in 2X steps until we get the desired resolution for 
light pen interaction or to show greater detail . 
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“Drawing” on the CRT 

The quotation marks are to emphasize that the light 
pen cannot be used as a stylus. This would be desirable, 
of course. However, this is not as great a limitation 
as it might seem, since we are trying to get the students 
to appreciate spatial relationships and proportions, 
rather than training them as artists. There are several 
means by which we permit students to generate their 
own pictures and have them evaluated. In each of 
these instances, the erase feature is a distinct limitation, 
and we are actually delaying much of our development 
in graphic until a hardware modification comes through. 

(1) . We present the student with our dot matrix and 
have him input from the keyboard, evaluating his 
picture segment by segment. This may sound artificial 
but it works quite well. However, from the keyboard 
there is a 100 character input buffer, so here, more than 
anywhere else, we feel the limitation of the erase 
feature. 

(2) . We put a lighted square in each box. The student 
has three modes of operation from which to choose. If 
he is in the insert mode, touching a lighted box causes 
the square to be replaced by an asterisk like symbol. 
The replace mode causes the square to replace the 
asterisk, e.g., if he changes his mind. When he is finished 
he enters the erase mode in which every square he 
touches disappears and he is left to view his finished 
drawing [Figure 11 (A)]. The drawing is then evaluated 
by the computer, and those parts of his drawing that 
are judged to be accurate are regenerated using our 
graphic alphabet. Thus his drawing, represented in 
Figure 11 (A), would be presented back to him as in 
Figure 11 (B). However, any parts of his drawing not 
judged accurate would be left alone and he would have 
to try again. A photograph of this view of the skull is 
shown in Figure 12. 

(3) . We have every bone in the body drawn on 
coordinate paper. On the CRT a graph paper grid 
provides the lighted matrix for the light pen detect. 
In essence we have him point to a series of points and 
if he is correct, we generate the line of appropriate 
contour between successive points. With soft tissues, 
e.g., organs, muscles, etc., we are concerned with their 
relation to bones. The bony skeleton then becomes the 
lighted matrix upon which he draws. For example, it 
is of vital importance that the student know the normal 
projections of the heart and its various subdivisions 
onto the thoracic cage from every angle. Thus we 
present him with a graphic of the bony rib cage and 
ask him to point to where each chamber or structure 
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Figure 11 — (A) shows a crude form of light pen drawing by the 
student; (B) represents the computer evaluation of the drawing 
using our graphic alphabet 



Figure 12 — A CRT display of the skull from the front 


crosses the bones, and generate the pictures as he 
progresses. 

SUMMARY 

We have tried to describe some of the natural language 
analysis, self-adaptive programming, and interactive 
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graphic capabilities we feel are: required for medical 
education. Although the system and language we have 
been using were designed for CAI, they were not de- 
signed for the further capacities toward which we have 
tried to force them. We would like to have a system 
and a language that were tailored to meet the needs of 
higher education . 

CAI is expensive, but so is I medical education in 
its present form. Any tool that would significantly 
improve the quality of medical education can hardly 
be denied on the basis of cost. The real question is 
whether CAI can justify itself on a 'performance basis. 
Perhaps in two, five, or ten yfears the computer in- 
dustry will feel the state of the art justifies a real com- 
mitment to this field. However, will what they produce 
truly meet the needs of the medical educator unless a 
really interdisciplinary phase of research and develop- 
ment is undertaken now? 
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Design principles for processor maintainability 
in real-time systems 


by H. Y. CHANG and J. M. SCANLON 

Bell Telephone Laboratories 
Naperville, Illinois 


INTRODUCTION 

With the arrival of large real-time, time-shared 
systems, the requirement of system reliability has be- 
come even more demanding. The result of even a 
momentary system misbehavior could be catastrophic, 
since any disruption of service is experienced by all 
the users on-line at that time. Thus for real-time 
systems such as telephone switching systems, airline 
reservation systems, on-line teaching machine, etc., 
where numerous users are served, and critical real-time 
systems such as command and control, a high degree 
of system dependability and maintainability must be 
realized. 

Since many of the real-time systems employ the 
concept of centralization of logic, the overall system 
reliability objective in large part depends on how well 
the central processor itself meets the dependability 
and maintainability objectives. For a processor, the 
dependability objective often calls for the use of reliable 
components, conservative circuit design techniques and 
various redundancy methods. The maintainability 
objective, on the other hand, demands a processor 
architecture that is best suited for automatic trouble 
detection, recovery from faults and fault isolation, so 
as to insure operational survivability in an environment 
which is not free of faults. 

The purpose of this paper is to describe several 
design principles which may be used in planning pro- 
cessor organization, designing logic circuits, and fault 
detection and diagnostic tests in order to facilitate 
the design of a highly maintainable processor for 
real-time systems. Our scope will be limited to present- 


ing a unified account of some design guidelines, most 
of which reflect material assembled from a combination 
of analytical study and practical experience on a real- 
time time-shared system . 1 The problem of achieving 
high dependability by the use of various error detection 
and correction codes or redundancy techniques has 
received adequate treatment in the literature , 2-4 and 
will not be included here. In the second section we de- 
scribe the various observed trouble symptoms and their 
manifestation in the system. A maintenance sequence 
for preserving the system’s integrity upon occurrence 
of faults is then suggested. Guidelines for planning a 
processor organization to achieve high maintainability 
are discussed in the third section. Several principles 
for designing logic circuits and fault detection and 
diagnostic tests are described in the fourth section. 

System malfunctions and recovery procedures 

An important first step in establishing a fault re- 
covery and detection philosophy for a particular 
system is to establish the possible failure modes of 
both system and device components. On a system 
level, trouble symptoms usually manifest themselves 
in some form of mutilated data. They can be caused 
by errors in transmission or reception of data among 
the various units; e.g., a bit erroneously set on a memory 
access. Or, they can result from errors in internal 
data manipulation, e.g., attempting to reach an ad- 
dress which has been incorrectly computed. 

On a device level, the trouble symptoms with dis- 
crete logic implementation usually correspond to 
single, hard faults (by common assumption). A perma- 
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nently open diode and a transistor output stuck-at-1 
(s-a-1), are some examples of 1 this class of faults. 
However, these troubles usually manifest themselves 
in some observable system malfunction. With the 
advent of integrated circuit technology, more complex 
and varied device failure modes may be expected. 

As one of the principal requirements in a real-time 
facility is to provide continuous service, the system 
must remain operational even in a fault environment. 
This dictates that trouble syniptoms be recognized 
and the associated fault be isolated and repaired, with 
little or no interference from the user’s standpoint. 
This objective can be implemented by devising a fault 
recovery procedure. A fault recovery procedure usually 
consists of the following steps: fault detection, fault 
recognition, system recovery, and fault diagnosis. 

Fault detection is usually a function entirely per- 
formed by a variety of hardware implementing error 
detection codes such as parity checks, one-out-of-N 
codes, etc., and analogue signal margin checkers. 
Systems incorporating some levd of redundancy may 
also use matching between duplicated modules as a 
means of fault detection. In all cases the checker itself 
should routinely be examined by programs to insure 
its validity. 

The objective of fault recognition is to resolve a 
failure to a particular subsystem (e.g., a memory 
module, an input/output channel controller etc.). 
This is done by first establishing the type of error 
which has occurred such as a parity failure on a mem- 
ory read, and determining from that information, 
through some analysis procedure, what subsystem 
contains the fault. The analysis procedure may include 
a sequence of instruction retrys in order to distinguish 
the hard faults from the transients, and then to resolve 
the failure to the subsystem level by alternately exer- 
cising various suspected candidates. It may also ex- 
amine subsystem error indicators, over some period 
of time, to accumulate clues pointing to the source 
of malfunction. 

Once the failure is resolved to a subsystem, choosing 
the next step in the fault recovery procedure depends 
upon whether or not a spare subsystem is available. 

If a spare is not available, diagnostic action must 
be initiated to determine the identity and location of 
the fault. The normal system operation, which had 
momentarily been interrupted at the time of fault 
detection, must now be suspended through diagnosis 
and repair. The system must tjhen be recovered to a 
hardware state and program point where normal pro- 
cessing can be resumed. This sequence of events is 
depicted by Figure 1. 



Figure 1— ' Fault recovery sequence (without spare) 


However, if a spare is available, a different strategy 
could be taken. The system is first reconfigured by 
interchanging the faulty subsystem with its correspond- 
ing spare, using some method of program controlled 
switching. 1 The recovery procedure is then initiated to 
restore the system to a normal processing state, in 
order to reduce the period of interrupted service. The 
task of diagnosis and repair can be postponed and 
offered to the system as a relatively low priority job 
since it is the most time consuming step of the recovery 
procedure. This sequence of events is depicted in 
Figure 2. 

A comparison of Figures 1 and 2 illustrates some of 
the maintenance advantages of hardware redundancy. 
First the diagnostic task, which generally consumes 
more time than all the other recovery steps combined, 
can be deferred and interleaved with normal system 
processing on a time-shared basis after the system, is 
restored to sanity. Secondly, the availability of a 
spare permits a “good” vs. “bad” comparison type of 
diagnostic testing where the “good” machine inter- 



Figure 2— Fault recovery sequence (with spare) 
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rogates the faulty machine. This type of testing is 
readily programmed because of the availability of a 
spare and hence can be automatic. Without some level 
of redundancy, an approach must be used whereby 
the operator acts as the interrogator. This implies 
manually forcing the machine through recovery steps 
as illustrated in Figure 1. However, in practice it is 
often advisable to provide some subsystems with 
spares, and some without, to arrive at a balance of 
cost versus reliability. 

In most applications, the central processor, whether 
under program control or some combination of manual 
and program control, acts as the executor of any 
system recovery scheme. Thus it is of paramount 
importance that the central processor itself be highly 
maintainable. With this in mind then, the remainder 
of this paper will concentrate on outlining maintenance 
design principles for the central processor, regardless 
of the system environment in which it must perform. 

Structural considerations for processor maintainability 

Past experience has indicated that the effectiveness 
of programmed testing depends not merely on the 
techniques used in deriving tests and test results, but 
also on the inherent structural maintainability of the 
central processor unit. The central processor main- 
tainability is generally constrained by such factors as 
the modularity of the logic organization, the availa- 
bility of accessible tests points, etc. It is, therefore, 
appropriate to list some of the desirable guidelines to 
be included for consideration in order to achieve overall 
processor maintainability. 

Modularization 

In planning a processor organization for main- 
tainability, modularization is of utmost importance. 
The processor should be composed of well defined 
functional modules, with a minimum number of inter- 
modular feedbacks.* This is desirable to confine the 
effects of malfunctions as well as to facilitate pro- 
grammed testing. Specifically: (a) the function of each 
module should be definable as a register, decoder, 
sequencer, etc. Irregularities such as scattered special 
flip-flops imbedded into a well-defined decoder or 
sequencer, or circuits with a mixed mode of synchronous 
and asynchronous operations should be avoided. The 
symmetry and the regularity exhibited by the 
structure of these modules often imply uniformity in 


* An intermodular feedback is a control and/or data path that 
traverses a ring of functional modules. 


the trouble symptoms caused by faults in these modules. 
As a result, a considerable amount of effort in designing 
tests and deriving test results can be saved. For ex- 
ample, an attempt should be made to keep general 
purpose registers logically equivalent so that a single 
set of diagnostic tests will be applicable to all registers; 
(b) the interface between modules should be “con- 
trollable” and be as simple as possible. This implies 
that the number of intermodular feedbacks be mini- 
mized and that a uniform and consistent method of 
controlling information flow between modules be 
established. A common practice in designing tests for 
a large processor is to treat each functional module 
individually. As a result it is usually difficult to foresee 
global problems created by interaction among modules. 
Many of these interactions can lead to inconsistent 
test results, i.e., test results that may change from 
diagnosis to diagnosis. 6 For example, a fault in module 
A may prevent the initialization of some circuits in 
module B. If the test(s) for detecting this fault, due 
to the presence of global feedbacks, also depend on 
the proper initialization of these circuits in module B, 
the test results become inconsistent. In a large pro- 
cessor with many functional modules, the testing 
problems created by these interactions can be extremely 
complicated. Thus, a “clean” interface between mod- 
ules is very desirable. This means that in the test 
mode, every module should be, either directly or in- 
directly, controllable and monitorable. 

Accessibility and observability 

The result of segmentation of a processor into 
functional modules permits the strategic placement of 
test points for purposes of controlling and/ or monitor- 
ing the state of the machine during programmed 
testing. A method for test point placement has been 
considered by Ramamoorthy, with the use of graph 
theory. 6 The functional modules of a processor can be 
considered to correspond to nodes of a directed graph, 
and signal paths to edges. The nodes of a graph are 
partially ordered, from primary inputs to primary 
outputs. Feedback loops between nodes can be 
“broken” under the constraint that all nodes remain 
reachable from primary inputs. Additional control 
points are then inserted at places where feedbacks 
have been broken. Test points for monitoring purposes 
should also be added to modules whose outputs are 
not observable, either directly or indirectly, at primary 
outputs. The resultant processor organization is 
therefore, one in which every module is controllable 
and monitorable for programmed testing. Consequently, 
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the accessibility and observability are greatly improved. 
Our experience indicates that such a facility can often 
simplify the design of tests and may well improve the 
resolvability of faults. 

As an example, consider the organization shown in 
Figure 3(a). Each box represents a functional module. 
Global feedback loops (BEFB), (CEDC) and (CDC) 
are broken at edges FB and DC. Every module is 
still accessible from its primary input (through module 
A). Control points are added at FB and DC to enable 
modules B and C. An additional test point is also 
required at output of module F for monitoring purposes. 
The resultant organization, with modules partially 
ordered, is shown in Figure 3(b). Note that every 
module is accessible from its primary input and/or 
added control points and the outputs of every module 
are monitorable at its primary output and/or added 
test point (s). 

As will be seen in a later section, a modular organi- 
zation with adequate test points will greatly simplify 
the design of tests. Thus far only the design guidelines 
for the structure of a processor have been touched 
upon. Some principles for the behavior aspect are in 
order. 


PRIMARY 

INPUT 



PRIMARY 

OUTPUT 


Figure 3a — Functional modules of a processor an 
example 



Interrupt and rollback mechanisms 

As was mentioned earlier, a prime maintainability 
objective of real-time, time-shared systems is to 
preserve the system integrity in the presence of faults. 
The use of error detection and correction circuits may 
detect and mask out the misbehavior caused by some 
faults. For example, a system employing a Hamming 
code can effectively mask out single errors and recog- 
nize double errors. However, in real-time operations 
the tasks of recovery from a fault occurrence usually 
requires a combination of program and hardware 
mechanisms. Special interrupt circuitry must be pro- 
vided which is triggered by fault detection circuits to 
initiate the recovery process. Protected storage must 
also be provided to preserve the state of the machine 
in order to restart the program after the system has 
been recovered from a hardware failure. 

The use of interrupt and rollback mechanisms can 
be illustrated by the following example (Figure 4). 
Suppose the normal sequence of operation is Si, S 2 
. . ., S n where S x denotes a steady-state point, or a 
point to which the program can be rolled back. A 
fault is detected wdtile the transition from S 2 to S 3 is 
being executed. To prevent mutilation of data, this 
transition should be interrupted and all pertinent infor- 
mation on the state of the machine stored away. The 
system will then enter a maintenance mode to isolate 
and repair the fault. Once the trouble is cleared, normal 
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STORE STATE ^ RECOVERY, 
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Figure 3b — Partially ordered functional modules of 
a processor 


Figure 4 — -The use of interrupt and rollback mechanisms 
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operation can then resume by rolling back to steady 
state Si. 

Interrupt and rollback mechanisms have proven to 
be extremely valuable in real-time operations, es- 
pecially when there are excessive intermittent troubles 
in the system. 7 The maintenance of this additional 
hardware should be made periodically to insure that 
it is in proper working condition. 

Interface with external devices 

In many systems the central processor and its 
external devices such as memories are interconnected 
via common buses. To test the circuits of the central 
processor that are associated with buses, it is often 
necessary to send data and/or addresses to these 
external devices. This mode of testing is often in- 
efficient as it requires extensive initialization of devices 
in the external communities. Furthermore, the test 
results may be inconsistent since the data is highly 
dependent on the states of these devices at the time 
the central processor is to be tested. In a large system 
a central processor may communicate with numerous 
devices, and interfacing with these units for testing 
presents a serious problem. To avoid this situation, a 
separate return path should be provided (see Figures 
5(a) and 5(b)) so that the testing of interface circuits 
can be simplified. The return path concept establishes 
a testing environment in which the state of the pro- 
cessor during testing need not be dependent upon the 
states of other external subsystems or devices. In some 
cases a saving of twenty to thirty percent of time and 
program space for testing central processor interface 
circuits can be achieved. 

Circuit and test design 

Processor maintainability can be greatly facilitated 
if appropriate design principles are followed in circuit 
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Figure 5a — Processor interface with external system 
(operational mode) 
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Figure 5b — Processor interface with external system 
(maintenance mode) 

design and in developing diagnostic tests and programs. 
In this section, we recommend several such techniques, 
most of which are suggested by our experience and 
by the results of other workers in the diagnosis field. 

Circuit design 

Circuit Redundancy — A fundamental assumption 
shared by all diagnostic methods is the single-fault 
assumption, i.e., one and only one fault may occur 
since last diagnosis. The presence of an undetected 
fault may invalidate this assumption. Consequently, 
the effectiveness of the diagnostic can be weakened. 
Thus the requirement of deriving a complete test set, 
one which is capable of detecting all faults under 
consideration, is necessary in order to reduce the set 
of undetected faults that can occur in the field. The 
presence of redundant circuits greatly complicates the 
design of detection and diagnostic tests and generally 
weakens system maintainability. Although faults in 
redundant circuits may not affect system operation, 
they could invalidate certain tests designed for other 
faults under the single-fault assumption. 8 For example, 
the fault a stuck-at-1 (s-a-1) of a redundant circuit 
shown in Figure 6 is not detectable. The presence or 
the absence of fault a s-a-1 has no effect on circuit 
operations. However, suppose a s-a-1 exists and an- 
other fault f3 stuck-at-0 (s-a-0) occurs. The test 
vector (x=0, y=l) which was originally designed for 
detecting ft s-a-0, is no longer valid, as the path 
y— >/3— »z has been “desensitized”. 9 - 11 

As verifying the validity of all tests under all com- 
binations of undetectable or redundant faults is im- 
practicable, circuit redundancy should be eliminated 
whenever possible. 

Failure Modes — Many manufacturers have indi- 
cated that the use of integrated circuits yields a highly 
reliable design at low cost. However, to the best of 
our knowledge the failure modes of integrated circuits 
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Figure 6 — Example of a redundant circuit 

and their effect on the test design methods have not 
been fully explored. From samples that have been 
studied, the dominant failure modes are still the same 
as that of discrete components,: i.e., stuck-at-1 and 
stuck-at-0 types. However, there are also many other 
new modes of failure that may require special at- 
tention. 12 Since the integrated circuit configuration can 
introduce a number of parasitic components (such as 
diodes and capacitances) between connections, inputs 
(of NAND gates, for example) caff be grounded due to 
a parasitic diode short. Other modes of failures that 
are characteristics of physical design include inputs 
crossing, inputs simultaneously s-a-1 (due to a me- 
chanical bond lifting), collector to emitter short, etc. 
Until a better understanding of this subject is obtained, 
one must be cautious in adopting a given integrated 
circuit for production. A careful study of the feasi- 
bility of designing tests for detecting faults exhibiting 
possible abnormal trouble symptoms should be made. 

Circuit Behavior — It is generally known that one 
of the most difficult maintenance tasks is to handle 
faults, which may be intermittent or marginal, yielding 
inconsistent failure symptoms. Many of these faults 
are caused by gradual component deterioration due 
to aging, manufacturing defects! etc., which are un- 
avoidable. There are others that are caused by overly 
critical timing, or unrealistically tight tolerances, and 
can probably be avoided by careful design. Examples 
of these cases include (a) a hard fault in one circuit 
which causes marginal operation in another circuit 
(e.g., hard fault in a voltage regulator), (b) a hard 
fault in one circuit which prevents the initialization 
of another circuit (e.g., a fault I in a clock gate), (c) 
faults which cause circuit operation that is dependent 
upon equipment options employed in the unit being 
diagnosed, and many others. The test results obtained 
under these circumstances are usually unpredictable. 
To avoid diagnostic inconsistencies, the test designers 
are required to perform the time consuming, arduous 


task of reviewing the entire unit to uncover these 
deficiencies, and organizing test sequences by the use 
of early terminations or selected test skipping tech- 
niques. The scope of this task can be minimized if 
circuit designers are encouraged to design circuits 
which are well-behaved even under failure. However, 
since it is unrealistic to assume that diagnostic and 
circuit designers will be completely successful in pre- 
venting marginal or intermittent faults, some tools 
should be provided to aid maintenance personnel in 
resolving abnormal fault conditions. 

Connectivity and Packaging — With the use of large 
scale integration and the increase in logic density, the 
relative cost of factory testing and field maintenance 
is rapidly escalating. Minimizing the number of global 
feedbacks between modules makes the system less 
sequential (more combinational); the task of testing, 
as well as that of generating field maintenance tests 
and a fault dictionary or catalog, is thus simplified. 
However, the situation could be further improved if 
the circuit designer would reduce, wherever possible, 
the number of fan-ins and fan-outs, and especially, 
the number of reconvergent fan-outs.* The problem 
created by reconvergent fan-outs in deriving tests has 
been noted by many workers, 9,10 It greatly complicates 
the test generation process and can also affect the 
fault resolvability, as in many cases faults in fan-out 
regions are not distinguishable from those in fan-in 
regions. Thus, reconvergent fan-outs should be avoided, 
wherever possible. 

A common practice in circuit packaging has been 
to assemble each type of plug-in package to contain 
several of the same type of logic elements such as 
flip-flops, p-input NANDS, etc. However, this practice 
is not necessarily an optimal one from the viewpoint 
of attaining maximum fault resolvability. As diagnos- 
tics are generally associated with “ actions” rather than 
with circuits, 13 serial packaging (i.e., organizing logic 
elements along paths from inputs to outputs) would 
yield a far better diagnostic resolvability than, parallel 
packaging. Admittedly, serial packaging will result in 
more types of plug-in packages. Since in Medium 
Scale Integration or Large Scale Integration a system 
may only be composed of several of these packages, the 
requirement that faults be isolated to one and only 
one suspected package is quite necessary in order to 
reduce repair time and/or possible additional un- 
necessary package replacement. This implies that the 


* Suppose gate B is reachable through gate A along some path(s). 
Reconvergent fan-outs of gate A are those fan-out paths that 
reconverge at gate B. 
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use of a serial packaging technique to improve reso- 
lution should be carefully considered in the design 
stage, along with other attributes such as cost of spares, 
size, quantity, complexity and production yields, etc., 
to achieve an economic balance. 

Design of maintenance tests 

The design objectives of maintenance procedures 
to enhance processor maintainability are basically two- 
fold: (1) to design a set of tests capable of detecting 
and isolating all single, solid faults to a replaceable 
package level, (2) to insure that test results will be 
consistent for all faults from diagnosis to diagnosis. 
The aforementioned design principles for processor 
architecture and circuit design were aimed at facili- 
tating the design and the application of maintenance 
tests. In this section, we present our recommendations 
on methods of deriving tests and generating test 
results, on techniques of structuring fault detection 
and diagnostic programs, and diagnostic data 
interpretation. 

Tests Derivation — Methods of deriving tests for 
logic circuits have been extensively explored. 9 ’ 10,14 ' 16 
The objective is to generate a set of tests capable of 
detecting each member of a prescribed fault set. The 
most significant result that is applicable to circuits of 
practical size is the path sensitizing concept or the 
D-algorithm technique. 9-11 The idea is to assign a 
certain input test vector to a circuit so that faults 
along some path from input to output will cause the 
circuit output vector to be different from that obtained 
under the fault-free condition. For combinational logic, 
programs for deriving sensitized paths are fairly simple 
to implement. The running speed is also moderate 
for circuits with very few reconvergent fanouts. For 
sequential logic, there is no known technique that can 
efficiently handle circuits with even a moderate number 
of feedback paths. A practical approach, therefore, 
would be to design the processor organization with a 
minimum number of controllable feedback paths, as 
was suggested in an earlier section making the logic 
purely combinational for the purpose of testing. The path 
sensitizing techniques can then be used to derive a 
complete test set. 

Generation of Test Results — The pros and cons of 
developing a digital fault simulator for generating 
test results as opposed to other alternatives (e.g., the 
manual method and the physical simulation approach) 
have been discussed by Manning and Chang. 16 It was 
concluded that the digital method is extremely useful 
in the early design stage to provide immediate feedback 


on the adequacy of hardware design and processor 
maintainability. The physical method seems to have 
an edge in computer time required to generate all the 
test results. However, it is not clear how the physical 
method can be used for a circuit realized with inte- 
grated circuit technology. 

At present, for circuits with 100 logic gates a typical 
estimate of required computer time to generate test 
results is about one hour. 17 With improved techniques 
for fault simulation, the running can be substantially 
reduced so as to make the digital approach even more 
attractive. 16 Those readers who are interested in the 
detailed description of the development of a digital 
fault simulator can refer to several articles by Seshu. 
See References 19-21. 

Test Ordering and Minimization — The test set and 
the test results generated through the simulation- 
process usually contain redundancy. In some real-time 
systems in which both program space and time are 
at a premium, it is desirable to select a minimum or 
near minimum set of tests that isolate faults only to 
the circuit package level. To accomplish this, the 
tests should first be arranged in “logical” order in the 
same manner as modules of the processor are ordered 
(see Figure 3(b)). This in effect constitutes a paral- 
lelism between the organization of the processor and 
the structure of testing procedure, which is considered 
to be a useful aid in isolating marginal and/or inter- 
mittent faults that produce inconsistent test results 
from diagnosis to diagnosis. Then, the test set for 
each module can be reduced by using one of the known 
methods for selecting an optimum set of diagnostic 
tests. 22-24 

Program Structure —The final phase in the design 
of a diagnostic testing procedure is to incorporate the 
tests and test results (obtained through the simu- 
lation process) into a diagnostic program. In order to 
minimize the overall program development effort (e.g., 
programming, debugging, integration and documen- 
tation) and to reduce the program maintenance effort 
(e.g., updating changes, etc.), the program structure 
should be modular, uniform and consistent. To ac- 
complish these objectives, the use of the “data table” 
approach is recommended. 

Basically, the program is composed of two parts: 
the control section and the data table section. The 
data table section consists of a sequence of standard 
entries, each of which specifies the operation of a 
particular test or test sequence for certain modules. 
Typically, each entry specifies (a) the input test vec- 
tor(s) to be applied, (b) the prescribed length of time 
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or number of central processor cycles the circuit is 
forced or allowed to operate, (c) the expected circuit 
response or output (s), (d) the information necessary 
to interpret the results, and (e)| the required ini- 
tialization information, if any. The information con- 
tained in the data table can be derived with the aid of 
a digital simulator. The control section, on the other 
hand, is a program designed to interpret these entries 
and perform functions such as initialization, segmen- 
tation of tests, interfacing with other programs, ma- 
nipulation of test outputs, etc. Figure 7 illustrates the 
layout of a typical diagnostic program structure. 

Experience on using this particular design approach 
reveals several advantages: (a) the design process 
becomes standardized, which in turn results in a large 
saving in program development; (b) programs are 
more easily modified, e.g., if the circuit changes, the 
majority of program alterations will be restricted to 
the data table section; (c) test results are easier to 
interpret, and (d) the control sectibn can be written 
and debugged independently of the data table section. 
Also, the data table lends itself well tjo the participation 
of many designers, e.g., the register! designers develop 
the data table for the registers, the! decoder designers 
develop the data table for the decoders, etc. However, 
in systems where a large number of memory fetches 
can be penalized in time, it may suffer a slight draw- 
back in that an increase in execution time of the 
program over the conventional approach may be real- 
ized. However, this problem is not serious as the diag- 
nostic program is not a frequently executed program. 

Data Interpretation — In large reahtime systems, the 


diagnostic output usually corresponds to an enormous 
amount of data (e.g., for a processor of 10 4 gates, a 
test vector might be represented by about 5,000 bits, 
where each bit designates the pass or fail of a test). 
In addition* the observed fact that test results of some 
faults are inconsistent from diagnosis to diagnosis 
demands a flexible data interpretation procedure. 

Several techniques for resolving diagnostic data 
into faulty components or circuit packages have been 
described in the literature. 5 These techniques employ 
the concept of some form of fault dictionary in which 
each entry of the dictionary points to the set of faulty 
components or circuit packages producing the par- 
ticular failure pattern(s). These patterns can be derived 
by simulation. 11 * 20 - 26 

The simplest form of dictionary is a listing of test 
results where a “0” indicates a test passed and “1” 
indicates a test failed. Faults are located by finding a 
match between the observed failure pattern and the 
entry listed in the dictionary. This technique is ade- 
quate to analyze failure patterns consisting of a rela- 
tively small number of failing tests. For fault conditions 
producing a large number of failing tests, a data 
compression technique to represent the pattern in 
some compact form (e.g., a fixed length number) is 
desirable in order to minimize the system repair time. 
The tradeoff between the isolation accuracy of the 
dictionary and the resolution provided by each of 
these techniques is discussed in Reference 26. The final 
choice of methods for interpreting diagnostic data for 
fault isolat .on depends on the allowable system down- 
time and the availability of skilled maintenance 
personnel. 


CONTROL SECTION DATA TABLE SECTION 


INITIALIZE* CIRCUIT (I) 



Figure 7 — Layout of a diagnostic program structure 


CONCLUSION 

In this article we have given a unified account of design 
principles for processor maintainability in real-time 
systems. The processor should be functionally well 
modularized with a minimum number of intermodular 
feedbacks. This is necessary to confine the effects of 
malfunctions as well as to facilitate programmed 
testing. To insure the validity of diagnostic data the 
amount of hardcore should be minimized, and ample 
test points must be provided to control the state of 
the machine, even under faulty conditions. Adequate 
system recovery mechanisms must also be incorporated 
to insure system sanity in a fault environment. Further- 
more, the processor should have a clean interface with 
the external devices such as memory units and peripher- 
al systems to enable the rapid identification of trouble 
symptoms to a subsystem level. 
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The design of processor logic circuits should be 
preceded by a thorough understanding of the failure 
modes of the circuit technology chosen for implemen- 
tation. The elimination of circuit redundancy and the 
incorporation of a packaging scheme which provides 
good diagnostic resolvability are some other desirable 
prerequisites for a good maintenance scheme. Finally 
individual circuits should be examined to determine 
whether an all hardware, hardware-software, or all 
software maintenance facility should be provided. 

The diagnostic program should be structured to 
efficiently implement the selected testing procedure 
(combinational or sequential). It should also provide 
a flexible operator interface to aid in isolating inter- 
mittent faults. Computerized fault simulation methods, 
which enable one to generate and evaluate the diag- 
nostics, should be used throughout the design stages 
to provide adequate feedbacks on the effectiveness of 
system’s diagnosability. 

It is recommended that designers consider these 
guidelines in planning a new machine organization, 
designing logic circuits and maintenance tests so that 
an optimum mix of software and hardware for pro- 
cessor maintainability can be achieved. Because of the 
increased complexities of present and next generation 
computing systems, and because of the rapidly changing 
technologies, new maintenance techniques will have to 
evolve at an accelerated rate. We have only documented 
a few thoughts on guidelines for processor maintaina- 
bility in real-time systems. Our opinions are obviously 
influenced by our training and experience. Since there 
are only a limited number of published documents 
on this subject, we encourage other workers in this 
field to present similar results. 
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Effects and detection of intermittent 
failures in digital systems 


by M. BALL and F. HARD IE 

IBM Corporation 
Owego, New York 


INTRODUCTION 

A great deal has been written during the past few years 
on the subject of diagnostic test procedures for digital 
systems. Almost without exception, however, the in- 
vestigators have limited their interest to the detection 
and location of solid faults, and their test procedures are 
usually based on the assumption that either the fault 
exists for the running time of the test procedure or the 
time interval between the fault occurrence is less than 
the required time to run the test. 

In practice, experience has shown that field failures in 
digital systems used for aerospace application (e.g., 
Titan and Saturn vehicle guidance computers) tend to 
be intermittent in nature. The authors believe that this 
experience is testimony to the efficiency of the current 
diagnostic test procedures in screening solid faults 
from digital systems before delivery for field use, not 
that failures which develop in the field tend to be 
intermittent. That is, diagnostic testing of aerospace 
digital systems using the advanced test procedures 
available today generally detects all solid faults but 
only a small portion of the intermittent faults that 
exist in any digital system prior to delivery to the field. 
The residue of intermittents in the system which 
escaped detection eventually make their presence known 
during field operation. 

The reason for the emphasis on diagnosis of solid 
faults is the relative complexity involved in the diag- 
nosis of intermittent faults. This is the natural course of 
evolution in system design as well as in biology-adaption 
to basic environmental requirements with later complex 
specialization. 


In an attempt to direct the evolution of diagnostic 
techniques along the channels leading to efficient 
detection and location of intermittent failures in digita 1 
systems, the authors conducted a series of experi- 
ments on the effects and detection of intermittent 
failures. Over 500 hours of IBM 7090 time were accumu- 
lated using a sophisticated logic simulator to evaluate 
the Saturn V Launch Vehicle aerospace computer opera- 
tion in both normal and failure modes. The purpose of 
these experiments was to determine the effects of inter- 
mittent failures on computer operation rather than to 
investigate the mechanisms of failure, and to evaluate 
the detectability of classes of failure rather than to 
develop specific techniques for failure detection. 

In this study solid faults were treated as a special 
case of the general class of intermittents. That is, a 
solid fault was treated as an intermittent whose dur- 
ation exceeds the running time of the test program. 
The simulated intermittents were made to vary in 
duration from 500 nanoseconds upwards (one clock 
time of the simulated computer), and were specified 
in the computer logic at randomly chosen points of 
combinational and sequential circuits. A total of 
792,884 intermittent failures were simulated to give a 
realistic statistical sample. These intermittent points 
were chosen to occur in the program control and 
arithmetic sections of the simulated computer. 

For each intermittent a record was kept of the time 
of error occurrence, time of error detection and the 
number of failures which caused a difference in oper- 
ation from a “good” machine. From these records the 
prpbability of detection was calculated assuming a 
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TABLE I — Intermittent detection capabilities 



Unit 

Total 

Failures 

Failure 

Duration 

Failures 

Causing 

Incorrect 

Operation 

% 

Affected 

Logic 

Failures 

Detected 

Detected 

% 

Adder/ 

Subtractor 

267,894 

500 nanosec 

22,276 

8.4 

1,118 

5 

Multi/ 

Divide 

269,590 

500 nanosec 

22,376 

8.3 

1,122 

5 

Program 

Control 

255,400 

to 

5 millisec 

500 nanosec 

44,704 

17 

252 

0.5 



to 

5 millisec 






‘ ‘perfect” error detector. The results showed that many 
intermittent failures exert only a weak influence on the 
correct operation of synchronous logic circuits. As 
shown in Table I, approximately eight percent of the 
simulated failures caused the arithmetic element to 
perform incorrectly, with a comparable (five percent) 
probability of detection by the “ perfect” error de- 
tector. 


These programs operated on the IBM 7090 computer 
as shown in Figure 1. The compiler program produced 
7090 instructions for the logic portion of the simulator 
program. The failure injection program allowed the 
introduction of selected faults into the logic portion of 
the simulator program on the component level — that 
is, open or shorted diodes and transistor outputs 


The system simulator 

One of the most serious problems confronting the 
designers of digital systems is the task of verifying 
proposed design features. Both manual analysis and 
simulation techniques are used to aid in this task. 
During the design and development phase of the 
Saturn V Launch Vehicle Digital Computer, a Fault 
System Simulator was developed* by IBM to provide 
the means of (1) verifying the logical integrity of the 
digital equipment, (2) evaluate design changes before 
commitment to hardware, and (3) evaluating test 
programs. During the course of its use, however, 
emphasis gradually shifted to a special simulator 
application] which generate information on the charac- 
teristics of machine operation to aid the engineer in 
diagnosing malfunction symptoms. One of the most 
significant series of simulator experiments was con- 
cerned with evaluating the sensitivity of the digital 
logic to intermittents. 

The system simulator consisted of a compiler, failure 
injector, logic simulator, and evaluation programs. 


* Design and Use of Fault Simulation for Saturn Computer 
Design, by F. Hardie & R. J. Suhocki— IEEE Trans, on Elec- 
tronic Computers Vol EC-16, No. 4 August 1967 p. 412-29, 



Figure 1 — Simulator flow diagram 
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stuck to a logical zero or a logical one. The simulator 
program operates on a 7090 description of the digital 
equipment (a logic master tape) to simulate the logical 
behavior of the equipment in normal operation and in 
various failure environments. 

The simulator program executed special test pro- 
grams and displayed, by means of print-outs, the state 
of selected logic nodes or register contents at every 
clock time of an instruction cycle. In investigating 
the behavior of equipment containing logic failures, 
simultaneous failure environments were provided by 
using parallel simulation techniques, and the system 
states for each environment were determined simul- 
taneously. Of the 36-bit 7090 word, 3 bits were used 
to represent the normal system state and each of the 
remaining 33 bits were used to represent a failed state 
Multiple faults were simulated by injecting 2 to 25 
failures into a single bit position. 

Up to 100 logic test nodes were available for print- 
out in each normal or failure environment. Special 
pseudo operation codes allowed additional nodes to be 
retrieved as required. Another pseudo operation code 
caused the contents of selected registers to be placed 
on the simulator output tape for use by the evaluator 
program. 

The evaluator programs identified fault symptoms 
and correlated these symptoms with the injected 
failures. The output of the evaluator was a report of 
detected errors, undetected errors, accuracy of diag- 
nosis, and general behavior of the digital equipment. 

Simulator applications 

The primary applications of the system simulator 
can be grouped into four general categories: design 
evaluation, failure evaluation, data generation, and 
data analysis. The obvious use of the simulator was to 
provide early and rapid verification of the logical 
integrity of the basic hardware designs of digital 
equipment. In addition to checking the basic logic, the 
simulator was used to determine whether certain design 
ground rules were satisfied by the circuit designs, and 
even whether the ground rules themselves were ade- 
quate. For example, individual circuits were checked 
against fan-in and fan-out constraints. In addition, 
the constraints themselves were checked against drive 
and load requirements by applying random and worst 
case parameter values to the drives, driven circuits, and 
circuit loads. 

Delay simulation, incorporating logical element delay 
characteristics in the logic simulator, was used to 
analyze the nature of digital signal propagation in the 
computer designs. Several race conditions were de- 


tected by the delay simulations w r hich were corrected 
by modifying equipment initialization procedures or 
by design changes. 

Operational and test programs were evaluated on 
the system simulator. Although functional program 
simulators provide nearly error-free programs from 
the standpoint of information flow, an appreciable 
amount of program debugging is usually required 
when the program is first used with the hardware. 
Logic simulator evaluation of programs reduced this 
final debugging phase to a minimum. 

The applications discussed so far pertain to properly 
operating equipments. The logic simulator should be 
regarded in such applications as a tool to aid in design 
analysis and not as a replacement of manual analysis 
and engineering judgment. In the area of failure mode 
analysis, however, the simulator as a tool becomes 
even more important because of the inherent difficulty 
in determining the behavior of failed machines, and 
especially in identifying the fault from the failure 
symptoms. 

The failure injection program and diagnostic evalu- 
ation programs provide a failure evaluation capability 
for the system simulator. Test programs for equipments 
were evaluated for their failure detection and fault 
isolation capabilities. Built-in test circuitry and test 
point configurations were evaluated in the same manner. 
Optimum placement of detection circuits and test 
points was determined by successive simulations. 

Although the evaluation applications represent per- 
haps the most important use of the system simulator, 
the simulator also possesses a capability of generating 
data which is useful not only in design and test of the 
system but also in increasing the capability of the 
simulator itself. For example, a diagnostic catalog can 
be generated as a by-product of a test program evalu- 
ation which relates each injected fault to the resulting 
failure symptoms. The catalog is then available for use 
in evaluating diagnostic programs or procedures in 
further simulations. 

One of the applications of the logic simulator which 
is generally very difficult to perform manually is to 
trace the propagation of an error caused by a com- 
ponent failure, especially when the failure produces a 
loss of program control. Such traces can be generated 
by logic simulation, however, and have important diag- 
nostic value in identifying system faults. The status of 
the failed equipment at every clock time can be de- 
termined by monitoring over a hundred nodes or test 
points internal to the equipment logic, as well as the 
equipment interface. A summary of simulator appli- 
cations is shown in Figure 2. 
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Design Evaluation 

Hardware j 

Basic Logic 

Design Ground Rules 
Delay Simulation 

Software j 

Operational Programs 
Test Programs 

Design Changes 
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Error Detection 
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Diagnostic Capa- 
bilities 

Circuit Sensitivity j 

Error Propagation 
Failure Effects 

Data Generation 

Node Data 
Error Traces | 

Diagnostic Catalog 

Data Analysis | 

Laboratory Support 
Field Failure Analysis 

Figure 2 — Simulator applications 

Simulation of intermittent failures 

The application of the simulator! which is the primary 
concern of this paper was a series of experiments to 
determine the sensitivity of logic to intermittent 
failures. Intermittents simulated! by the failure in- 
jection program were made to vary from one clock 
time to the cycle time of the test program (representing 
a solid failure). These faults were injected at randomly 
chosen points in the equipment Ibgic and at random 
points in the test program. 

For each intermittent a record Iwas kept of the time 
of occurrence, time of detection,; and the number of 
failures which caused a difference from the “good” 
machine. The results of the simulation indicated that 
many intermittent logic failures had very little effect 
on the operation of the digital equipment — less than 
ten percent of the total failures injected into the simu- 
lator program caused the logic to; perform incorrectly. 
Analysis of the simulation resultjs disclosed that this 


masking of failures by the logic was due primarily to 

. the extensive use of combinational logic 

• the clocking of the AND gates which feed and/or 
gate the logic levels from the sequential circuits. 

• the duration and frequency of the intermittent 
failure. 

These simulation results and conclusions were based 
on a relatively small statistical sample — a few hundred 
simulated failures. In order to obtain a realistic sta- 
tistical sample, the failure injection program was 
modified to execute the following procedure : 

1. The logic failure was initiated at the first clock 
time of the test program. 

2. The test program was executed until a state 
difference was detected by the simulator program 
between the logic under examination and a 
“good logic” reference. 

3. Upon failure detection, the time of detection 
and failure symptoms were recorded, the logic 
under examination reset to the same state as 
the reference logic, and the test program ad- 
vanced to the next clock time. 

4. The procedure was repeated for one full cycle 
of the test program. 

The immediate data from this simulation provided 
a measure of the sensitivity of the logic to intermittent 
failures of one clock time duration. That is, the portions 
of the test program during which the injected faults 
cause a deviation from normal operation were identified. 
The same data was used to provide a measure of the 
sensitivity of the logic to intermittent failures of longer 
durations than one clock time by manipulating the 
data with simple editing programs rather than by 
further simulation, making it feasible to accumulate 
information on an equivalent of over a half million 
simulated failures. 

To assure the validity of the above techniques, the 
quantitative results concerning the sensitivity of the 
logic to intermittents obtained by the first method of 
actually simulating failure durations of one clock period 
and then manipulating the data with special edit 
programs, were compared and found to be closely 
correlated. The combined data from both simulation 
experiments was then used to derive a series of curves 
representing the sensitivity of the logic to intermittents 
of various durations, two of which are shown in Figures 
3 and 4. The ordinate in each figure is the probability 
that the intermittent will cause a malfunction in log;ic 
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Figure 3 — Sensitivity of arithmetic logic 


operation, while the abscissa is the duration of the 
intermittent. 

The sensitivity of the logic was found to vary ap- 
preciably not only with the class of logic (combinational 
or sequential) but with the operational function of the 
logic circuitry as well. This condition necessitated the 
plotting of sensitivity versus fault duration individually 
for different areas in order to obtain meaningful re- 
lationships. 

A summary of these results is given below: 

• There is a smaller probability of detecting inter- 
mittent failures in combinational (AND -OR) 
circuits than in sequential (LATCH) circuits. 

• There is a very low probability of detecting a 
single occurrence intermittent failure on a logic 
page (average population of 120 AND, OR, invert 
type circuits). This condition exists because many 
intermittents do not make the “failed” logic act 
different from the “good” logic and the detection 
of intermittents requires that the logic must be 



Figure 4— Sensitivity of multiply /divide logic 


exercised by appropriate data for the failure to be 
detected. 

• For these injected intermittents, a fault existing 
for one clock time was virtually undetectable; one 
existing for ten computer word times was about 
50 percent detectable; and one existing for 50 
computer word times was almost 100 percent 
likely to be detected. 

• There is a wide variation of error detection sensi- 
tivities between computer modules. 

Test 'program efficiency 

An analysis of simulation results was performed to 
determine the quantity and type of information which 
should be generated by a test program to assure a 
reasonable probability of error detection and fault 
location in the digital equipments. Figure 5 shows the 
efficiency of the test program versus the size of the 
test program for various types of failures. Curve a 
represents a solid failure. Curves b and c represent an 
intermittent failure of 100 clock time duration in typical 
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Figure 5 — Efficiency of test program vs. program size 

sequential and combinational logic, respectively. Note 
that, although a reasonably high efficiency of detecting 
a solid failure was achieved with a relatively short 
test program (90 percent with 200 instructions), the 
probability of detecting the intermittent was almost 
linear with program size. 

Many different types of error symptoms were pro- 
duced as a by-product of the simulation experiments. 
Each symptom was analyzed to determine its individual 
and combined value in identifying logic failures. Figure 
6 is a summary of the results of this analysis for solid 
faults. The relative diagnostic values of the error 
symptoms in identifying intermittent failures are about 
the same except that the percentages will be less ac- 
cording to the duration of the intermittent. 

Due partly to unavoidable redundancy in a test 
program (by which a logic element is exercised more 
than once) and due to error propagation in digital 
systems, an error in logic operation resulting from a 
failure of a logic element can occur several times during 
the execution of the test program. The first line of 
Figure 6 indicates the program instruction during 


which an error was first detected. The second line 
indicates the phase, bit and clock time that the error 
was first detected. The third line indicates the first 
three program instructions during which an error was 
detected. The remaining lines indicate various com- 
binations of the above test parameters. 

Observed Failure Symptom Failures 

or Parameter Identified 

First Program Step of Detected Error 10.5% 

Time of First Detected Error 28.1 

First Three Program Steps of Detected Error 63.2 
First Program Step of Detected Error and 

Time of First Detected Error 71.8 

First Three Program Steps of Detected 

Errors and Time of First Detected Error 83.2 
First Three Program Steps of Detected 
Errors and Time of Each Detected 
Error 97.5 

Figure 6 — Symptom failure correlation 

CONCLUSIONS 

The series of simulation experiments described above 
strengthened the authors’ opinion that the prevalence 
of intermittent failures of digital equipments in the 
field is due to the relatively low efficiency of current 
test techniques in screening such failures before de- 
livery of the equipment to the field. That is, although 
current test techniques cause most of the solid faults 
which are “built into” the equipment during fabrication 
to be discovered before release to the field, a large 
residue of intermittents slip through the test screen 
and cause operational errors during field use. 

The simulation experiments described above did very 
little in the way of deriving a solution to the problem 
of intermittent failures. No attempt was made to 
determine the mechanisms or characteristics of actual 
intermittent faults in existing digital equipment. The 
experiments were designed only to examine the sensi- 
tivity of digital logic to intermittent faults in general, 
without regard to mechanisms of failure. 

The simulation results indicated to the authors that 
current test techniques, slanted toward detection and 
location of solid faults in digital equipment, are ade- 
quate for solving the problem of intermittents. The 
experiments showed a rather surprising insensitivity to 
intermittents of short duration. Although this insensi- 
tivity may seem to be a fortunate characteristic for 
actual operation, it makes the problem of testing 
infinitely more difficult. 
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Two general approaches to the test problem are ob- 
vious: 

• develop better test techniques for detecting and 
locating intermittent faults, and 

• develop techniques for making the intermittents 
appear solid. 

The second approach has found widespread ac- 
ceptance, as indicated by the common use of vibrational 
and thermal stimuli to force intermittent faults to 
expose themselves during factory checkout. In this 
way many intermittent faults are detected that may 
otherwise have slipped through the factory test screen. 
The prevalence of intermittent failures during field 
operation, however, testifies to the inadequacy of this 
approach by itself. 

A third approach is, of course, to design the equip- 
ment to be absolutely insensitive to intermittent logic 
failures. Instruction retry, check point rollback and 
redundancy are being advanced as possible solutions. 
Redundancy, especially triplicated logic with voting,* 
has proven very effective in this area, but not without 
cost in hardware and power. Eventually, when logic 
hardware becomes sufficiently inexpensive, redundancy 
may very well be the way of life and the intermittent 
problem will have been solved.** Meantime, there 
remains urgent need for developing better test tech- 
niques for detecting and locating intermittent faults 
in digital equipment. 

The greater part of test and maintenance cost of 
computer systems today is spent on detecting and 
isolating intermittent failures. Intermittents have com- 
prised over thirty percent of pre-delivery failures 
and almost ninety percent of field failures in several 
computer systems known to the authors, and this 
seems to be the trend in present computer technology. 
Unfortunately, most of the current research in diag- 
nostic techniques is concerned with the detection and 
location of solid failures. 

Logic simulation has provided a powerful tool for 


* IBM Proposes Triple-Redundant Computer, by M. Ball and 
F. Hardie, Computer Design Vol. 6, pages 34-36, Nov. 1967. 

** Self-Repair in a TMR Computer by M. Ball and F. Hardie. 
Computer Design Vol. 8, No. 4, pages 54-57, April 1969. 
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studying the effects of intermittents in specific com- 
puter organizations, but in itself is not a solution to the 
cost problem. Even when these effects have been 
identified, the techniques for designing a computer to 
be intermittent-resistant or for testing a computer to 
locate intermittent failures are not yet state-of-art. 

Figure 7 shows a typical curve of the relationship 
of the costs of testing and maintaining a computer 
system from its initial assembly to the end of its useful 
life. The following conclusions may be evident from 
the figure: 

• Intermittent failures are far more costly in test 
and maintenance than solid failures. 

• The cost ratio of intermittent to solid failures 
increases with system usage, especially following 
delivery to the field. The reason for this trend is 
probably the better screening of solid failures by 
current test techniques. 

• The cost of field maintenance remains high with 
usage, and most of the cost is due to intermittent 
failures. This large residue of intermittent faults 
is probably due to inefficient test screening rather 
than to new faults. 

• The costs of a computer system tend to be mono- 
tonically decreasing with use. End-of-life is 
forced by obsolescence rather than by wear-out. 
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INTRODUCTION 

Long term mission reliability of a modular computer 
has been studied at Hughes Aircraft Company as a 
consequence of a study with NASA ERC . 1 ’ 2 Par- 
ticular interest lay in the attainment of long term 
reliability with modular computer organization aid the 
effects on reliability of variations in modular organi- 
zation. The results of this investigation are presented 
in this paper. 

In the past, the designers of aerospace computers 
have concentrated on increasing computational speed 
and arithmetic capability within stringent weight and 
power limitations. There seems to be little doubt that 
aerospace computers will soon be extremely fast, 
versatile and compact. A requirement for long term 
system reliability has been developing and may drasti- 
cally change the nature of the on-board computer. 
Extremely long missions are being planned which 
require a computer to operate for one to five or more 
years aftei launch. Current on-bo? rd computer systems 
are not adequatj^^ 

One promising approach for achieving reliability 
and flexibility is through naadular design, where inde- 
pendent physical modules, functionally organized (e.g> 
memory, arithmetician trol, Input/Output) can be 
added or deleted to adapt to the required performance 


and processing needs in terms of speed and reliability. 
Improvement in reliability through the use of ad- 
ditional hardware has been receiving growing attention 
in the aerospace computer community . 3 

Specifically in this paper, a technique will be de- 
scribed which when properly applied will determine a 
computer configuration which can satisfy a required 
probability of mission success for some stated mission 
duration. It is assumed that some basic computer 
system exists which can perform the required compu- 
tations; what remains is to determine which additional 
computers or sub-computers should be added to pro- 
vide the necessary system reliability. 

A modular computer design 

Several techniques exist which are designed to in- 
crease the reliability of any given computer system. 
The approach taken here is to have a single computer 
perform all mission computations while a number of 
other computers remain in a dormant mode until the 
working computer fails. At that time, the failed com- 
puter is turned off and one of the dormant computers 
is turned on to resume the computations. The size, 
weight, and power restrictions will typically limit the 
number of spare computers that are available. An 
additional refinement to this concept is to segment 
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Figure 1 — Modular computer breadboard 

each computer into functional niodules. One possible using voting techniques as would be done during boost 

method is to isolate the memory, central processor, and phase, favors the H4400 modularity concepts whereas 

I/O functions as in the Hughes H4400 computer. the long term reliability tends to favor the NASA 

Another is to partition the computer into discrete modular computer partitioning. In either ca,se, the 

memory, control, arithmetic, and I/O units, as in the long term reliability is very sensitive to the logic 

NASA modular computer. This second approach will distribution within the modules and to the basic 

increase the total system parts count, but the increased reliability of the components. 

modularity may ultimately enhance system reliability. A breadboard modular computer of the MCB with 
A mathematical model was constructed which per- two modules of each type (a two column system) is 

mits the evaluation of computer reliability for the being constructed. These modules are of sufficient 
various configurations. j complexity to prove many of the points under con- 

The two configurations mentioned above were sideration. First, the system may be configured in a 

normalized in terms of logic complexity to the NASA one active, one standby fashion so that the techniques 

modular computer and the mission reliability was of error detection and reconfiguration may be explored, 

evaluated. Analysis showed that short term reliability Figure 1 is a block diagram of the NASA computer 
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breadboard. The CAU (Configuration Assignment 
Unit) is the module that provides for continuation of 
system functioning under module or switch failure. 
It controls the activation and connection of standby 
modules into the operating system and failed modules 
out of it. The system can be configured so that one 
active string (memory, control, arithmetic, I/O) 
performs computations while the other unused modules 
remain in a standby state. If a failure occurs in any 
module, the module can be turned off and a standby 
module switched in. With the breadboard, if more than 
one of any distinct module fails, the system fails. 

In order to compute the reliability of the modular 
computer, it is necessaiy to know the number of com- 
ponents in each module and their failure rates. Since 
it is also interesting to examine the case of the three 
modules (memory, cential processoi, I/O) configu- 
ration, Table I presents a breakdown of components 
for each case. As is shown in the figure, the basic 
component can be either the gate or the integrated 
circuit. 

Enough experience with the NASA breadboard has 
been accumulated to instill a high level of confidence 
that the modular computer concept is sound and indeed 
workable. The system may be arranged in a TMR 
fashion duiing the boost phase of a mission when 
calculations are proceeding too rapidly to allow re- 
configuration. After boost, two of the three computers 
would be turned off and the system would enter the 
one-active two standby mode. A block diagram of the 
pioposed system appears in Figure 2. With the ex- 
ception of the CAU which would be more complex due 
to the additional modules it must service, the com- 
ponent counts for each of other modules should be 
approximately the same as listed in Table I for the two 
column system. It will be shown that under some con- 
ditions even dual redundancy for the working modules 
would not provide adequate reliability for a five year 
mission. Several things can be done to eliminate this 
deficiency. First, and most fundamentally, the com- 
ponents (gates, IC's, or LSI's), can be made more 
reliable. Second, different configurations can be struc- 
tured such as 4 column and others so as to enhance 
the mission reliability. Certainly, the nature of the 
mission and weight constraints will impose a limit on 
hardware launched. The control unit (or alternatively 
the centra] processor) is clearly the largest single 
module of the system. If only an additional control 
unit were carried, total reliability would be increased. 
It will be shown, however, for a five year mission, 
considerable reliability enhancement must be made to 
the CAU as well as improvements to the other modules. 



Figure 2 — Three column modular computer 


Several techniques designed to improve the CAU are 
required if the computer is to function for several years. 
One is to build the CAU with as few components as 
is possible; assuming equal component reliability the 
fewer components it has, the longer it will last. A 
further increase in reliability can be obtained by tripli- 
cating the CAU and using TMR voting on its output. 
Although a TMR configuration ultimately becomes less 
reliable than a single unit, this does not occur until 
about '.7 times the mean life of the individual unit. 
Since the voter unit has relatively few components, its 
mean life is very long and therefore TMR is to ad- 
vantage here. The type of system discussed and il- 
lustrated in the figures is known as a closed system in 
that at the beginning all equipment is present with no 
additions or repairs possible thereafter. 

In analyzing system reliability, two important 
points for consideration are the rate of degradation of 
standby units and the switch reliability. 

Kletsky 5 shows that the mean life of a closed modular 
set cannot be increased significantly when active and 
standby failure rates are assumed equal. Therefore, it 
is necessary to obtain a value for this ratio. Little 
direct data, however, is available for *‘d”, the ratio 
of standby failure rate to active failure rate. However, 
data reported by Nerber 6 indicate that d is consider- 
ably less than unity. Nerber analyzed field data for 
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over 100 transistorized guidance computers. From this 
data a maximum value of d can be inferred to be about 
0.33. A more recent analysis of Minuteman II computer 
failures by Watson 7 shows that the expected value of d 
for integrated circuits is 0.55. A lower bound for this 
ratio appears to be 0.12. Extrapolation indicates that 
the ratio will decrease as more data is gathered. For 
future missions, it seems conservative to assume that 
the ratio will lie below 0.5 with 0.1 reasonably at- 
tainable. Though this ratio may further decrease as 
more is learned, the greatest significance for the closed 
module set reliability is effected with a d of the order 
of 0.1. 

The reliability model must also consider the effect 
of the switches. Though the switch size is held at a 
minimum (typically 8 data bits! + 5 control) its effect 
upon long term reliability if not properly treated can 
be great. For instance, the reliability of a single cross- 
point (with a normalized failure rate, X s = 2 X 10” 7 ) 
is .991, for a 3 X 3 switch .925, and the probability 
that 3 crosspoints out of the 3X3 switch will be oper- 
ational is 1-10~ 10 . Thus, for long term system relia- 
bility of the order of .99 proper treatment of the switch 
reliability is imperative. 

The following paragraphs describe the mathematical 
model developed to accommodate the above factors 


module 1, module 2 to module 2, or module 2 to 
module 1. Figure 3 illustrates a typical switch. 

It is desired to find the reliability of the total modular 
computer system at any instance of time. L modules 
must operate at each level as well as the switches to 
interconnect them. The problem is approached by first 
finding the reliability at each level then iterating from 
level to level including at each step the switch reliability. 

If switching redundancy is applied to a module level 
with L replicas operating and N-L in standby the 
reliability according to Kletsky (Reference 5) can be 
given as an inverse La place transform. 


R* = 1 - £r l 


1 (L + Kd)X 

-n — — 


s k-r-l s + (L + Kd)X 


\ 


( 1 ) 


where R is the number of modules operational (unfailed) 
for L < R < N and X is the active failure rate of the 
module while d is the ratio of standby to active failure 
rates. 

The Reliability can be reduced to : 


B,(t) 


N-~L 

Z 


k=r-l 


e -(L+Kd)\ t 


Ctf L/d + h 

\ h-K 

h^K 


) 


( 2 ) 


Reliability model 

The block diagram of Figure 2 will serve as a frame- 
work for the mathematical model. The mathematical 
model is somewhat more general than Figure 2 in that 
there may be more than four levels (a level consists of 
all modules of one type) of modules with the j-th level 
having JNTy modules out of which Ly are required oper- 
ational with Ny-Ly being in standby at the beginning. 
The computer system is operational as long as Ly 
modules are operational at the j-th level for all j’s 
with unfailed switch capability for interconnection of 
the Ly’s from level to level for all levels and the CAU 
is operational. 

The switch between each level allows connection 
of any module at one level to any module at the other 
level. Further, the switch is designed so that inde- 
pendent failures may occur ini the switch such that 
certain switching connections are disabled without 
affecting other connections (all other type switch 
failures can be associated with! a module). Thus, one 
failure in a switch may make connection of module 1 
of one level to module 2 of another level impossible 
without affecting the connectability of module 1 to 


Since failure is characterized by independent random 
variables the probability (P*(t) ) of exactly N-R+l 
failures (R operational) is equal to: 


P«(t) — R*(t) Ra+i(t) 


taking the difference and combining terms leads to : 


P*(t) 


= Z 


P *»K = R- 
P^R-L 


«( 


(L/d + P) \ 
R-L-P ) ' 


e -(L+Kd)\t 



L/d + h \ 
h-K / 


(3) 


To model the effect of the switches a recursive pro- 
cedure is used beginning at Level 1, then Level 2 and 
onto the highest level. 

Let Pfts(t) designate the probability that at time t 
exactly R modules are operational at the j-th stage 
and exactly S (S < R) of them may be reached from Level 
1 through a path of operational modules and sub- 
switches at the lower levels (Reference 8). Since R N 
the states of the process that yield at least one com- 
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puter’s worth, of capability (for the higher level) 
correspond to: 

(R, S) = (1, 1),.(2, 1), (2, 2), (3, 1), (3, 2), (3, 3), • • 
(N, 1), (N, 2), (N, 3), • • •, (N, N) 


with reference to the switch of Figure 3, the equation 
governing transition between the Various levels is 
given by: 

PJU(t) = P.w(t) (!) Z p^(t)-w^ (4) 

Ly_i < V < U < N j—i 

whore W is the probability that exactly S modules 
at level j are connectable through the switch from V 
modules at level j — 1. For exponential failure: 

= (1 —(1 — e - v O- 1 ) < )r)*(l _e-X'o — 1 )<)y(R-S) 

(5) 


where X'(y-i) is the cross point failure rate of the switch 
at the j-lst level. 

Substituting Equations (3) and (5) into Equation (4) 
yields the expression for the probability 


Ny-Ly 

p&(t) = z 

p = K = R — L ( y> 
p 7* R — L( y> 


L(j>/ d + p 
R — L ( y, ~ p 


L/d + h 

e -(io)+ K<, )(j)< JJ 

h=R ~ L U) +1 h - K 


h*K 


( 6 ) 


(!) Z P&P (1 -(1 

L ( y-1) < V < U < N ( y_!) 


(1 _ e -V ( y- 1)J)V(fl _ g) 


where the number of modules N<y>, those required 
operation simultaneously, L ( y>, and a module failure 
rate r ( y> are variable from level to level. 

Thus, the P (7) (t) vectors are obtained recursively 
starting first from P (1) (t) which is given component- 
wise by: 



Figure 3 — Module-switch-module relationships 


ni-li (L/d + p) 

Pfts(t) = X) ~ 

p=k~R~\ R — L — p 

P *n-L x 

tfi-M Li/d + h 

e - ( Li + x d) Xn jj if S = R 

h=R~L\+\ Yi K 

hf * K (7) 

- 0 if S 5* (R 


The total system reliability then including modules, 
switches, and CAU is given by: 


R(t) = ^CAU n pffl(t> 

Lm<S<R<N m 


( 8 ) 


where m is the highest module level and the relia- 
bility of the CAU is given by R C av 

R(t) then gives the probability that at time, t, at 
least one string of modules (Ly at each level) is con- 
nectable and operational and further that the CAU 
can switch modules in and out of the operating string 
when needed. 

Two assumptions implicit in this derivation should 
be noted. First that in assuming a strict exponential 
failure distribution for the modules wherein a module 
was assumed to fail at a rate X if in the active state 
and Xd if in the passive state, no allowance was madp 
for failures which might occur by change of state 
transitions. No algorithm exists for switching modules 
on or off to form an operational string which guarantees 
full utilization of the remaining system without re- 
quiring that some modules be placed in successive 
modes of active and standby states. It is therefore 
assumed that module failure is not influenced by its 
history of active and standby state transition. To 
solve the problem otherwise becomes extremely dif- 
ficult. 
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The second assumption is that all crosspoints of a 
switch are assumed to be in the active, state whether 
or not the corresponding modules are presently con- 
nected, and therefore the failure rate of crosspoints 
fail only at the active rate (no standby rate) . 

The numerical calculation may be facilitated by re- 
writing Equation (3) as : 

p m = /L/d + N 

P * (T) V N-R ) tn^L 

( _ 1 } -B + L + K (| ir R - y^T (9) 

Then expanding and collecting factors 

p.< T )-( L/d N + _V L ) 

e -(L+(R-L)i)T (A 0 — e-<“(Ax — en"- • • (A„ — • •)) 
where 

A„ = ^ N ~ R ^forO<h<N-R — 1 (10) 

Equation (8) requires computer solution. In order to 
achieve a better intuitive grasp for the components of 
reliability of a modular system and to help in the initial 
selection of architectural organisation with potentially 
high reliability over lengthy missions, a first estimation 
of the reliability of a system may be made with the 
aid of the graphs of Figure 4. The figure shows the 
reliability of modules with various degrees of redun- 
dancy against a normalised time scale. Notice the 
change of scale. In the first estimation of the reliability 
the effect of the switches can be neglected so that 
reliability of the system: 

m 

r, = rcau n r, (id 

;=Q 

where Ry is the reliability of the modules at the j-th 
level for all j's. The Ry' a may be found on the ordinate 
axis of the graph of Figure 4 when the normalized time 
has been computed . 


* Use generalized factorial function i.e. (n! — f (n + 1) 


To find the reliability of a particular closed module 
set first compute its equivalent units of normalized 
time which are equal to the product of number of 
gates, reliability of a gate, and time; then read the 
corresponding reliability from the graphs. Use 
Equation (11) for approximate system reliability. This 
procedure not only allows one to work out a reasonable 
configuration but also indicates what the basic gate 
reliability must be to attain the required reliability 
with a feasible organization. 

To illustrate the procedure by way of example, 
consider the configuration of Figure 2. Each level can 
be characterized by three parameters: N the number of 
modules, L those operating simultaneously and T 
normalized time. The system then can be summarized 
as: 



CAU 

AU 

CU 

MU 

I/O 

N 

1 

3 

3 

3 

3 

L 

1 

1 

1 

1 

1 

T 

2.1 

1.49 

3.54 

.85 

.88 


To compute normalized, gate count is obtained 
from Table I, the failure rate is assumed to be X = 10~ 8 
failures per hour per gate, normalized time is computed 
for five years and standby failure rate is 1/10 of active 
failure rate (d = .1). Module set reliabilities are read 
from the appropriate curves of Figure 4 as follows: 



CAU 

AU 

CU 

MU 

I/O 

T 

2.1 

1.49 

3.54 

.85 

.88 

R 

.12 

.76 

.24 

.92 

.91 


Then system reliability is : 

Rs = (.12) (.76) (.24) (.92) (.93) = .018 

Changes must obviously be made for a reasonable 
reliability. In Figure 5 a step toward higher reliability 
is demonstrated through several types of changes. The 
AU and CU are each divided into interchangeable 
halves and the AU provided with four spare halves 
while the CU has five. One unit each is added to the 
I/O and MU units. The CAU has been triplicated and 
at the same time reduced to one half its former size 
The system summary is : 
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Figure 4— Long term reliability curves 


CAU AU CU MU I/O 


N 3 6 7 4 4 

L 1 2 2 11 

T 1.05 1.49 3.54 .85 .88 


Reading the graphs for module set reliability : 


CAU AU CU MU I/O 


T 1.05 1.44 3.54 .85 .88 
R .89 .97 .8 .98 .98 


Thus, R s = .67 — a significant gain in system relia- 
bility, though additional steps must yet be made to 
reach the desired reliability. When a potential con- 


figuration is reached one must determine the additional 
gates added to each module by virtue of the new 
configuration and then calculate a corrected system 
reliability. Usually one must iterate through several 
configurations many times to reach the desired relia- 
bility with a minimal gate count. At this point 
Equation (8) may be used for a more accurate 
reliability value. 

CONCLUSIONS 

A method of estimating long term reliability of modular 
computers has been presented and two sample cases 
examined. In the second example 240 percent ad- 
ditional hardware was used to improve five year pre- 
dicted reliability from .018 to .67. To this must be added 
the additional switches to accommodate the increased 
modules (from 13 in first example to 24 in second). To 
obtain a reliability of the order of .99 for a five year 
mission perhaps the additional hardware necessary 







would amount to as much as four times that required 
for the actual computing. Gate! failure rates used in 
the examples are for present day high quality IC’s. If 
the basic gate reliability could be increased by a factor 
of ten this total additional hardware could be approxi- 
mately halved. 

The modular approach with standby modules appears 
capable of servicing long missions with feasible costs. 
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TABLE I— Two column component breakdown (approximate) 
modular computer breadboard (separate arithmetic & control modules) 
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% 
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55 
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39 
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17 
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CAU 

4800 

25 

1440 

27 


Switches 

180 

1 

55 
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I/O 

2000 

11 
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10 


Memory 
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11 
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10 


Processor 
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52 

2700 
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A compatible airborne multiprocessor 


by E J. DIETS RICH and L. C. KAYE 
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INTRODUCTION 

The control of large military forces is creating the need 
for large data-processing systems located in transport 
aircraft and in other situations where tight quarters 
and hostile environments call for the design features 
found in airborne systems. In these applications the 
configuration of the computer and its peripheral equip- 
ment strongly resembles what is found in a typical 
commercial da/ta-processing system, with some ad- 
ditional requirements for reliability. In particular, the 
functional programs are complex and extensive, and 
the availability of a complete package of support soft- 
ware, including compilers and utility routines as well 
as the resident executive, is likely to be of critical 
importance. Because of its cost, so complete a software 
package cannot reasonably be developed specifically 
to answer a particular military need; it must be cap- 
tured from an existing software system. The only 
source of complete data-management software packages 
is commercial data-processing; and thus it makes 
practical sense for a large, militarized data-processing 
computer to be strictly compatible with an existing 
commercial product. As a bonus, the commercial 
computer can then be used as a support computer 
for compilation and program checkout. An example 
of a program in which an airborne computer is sup- 
ported by an existing ground-based commercial com- 
puter is found in the Strategic Air Command’s Post 
Attack Command and Control System — Airborne 
Data Automation . 1 In this system the airborne com- 
puter is the RCA/USAF Variable Instruction Com- 
puter 2 and the ground support computer is the IBM 
7090 . 

The hardware compatibility required for capturing 


system software is rigorous . 8 It is not sufficient that 
the militarized computer contain a large subset of the 
commercial instruction list, or that it obtain nearly 
identical results when executing the same programs. 
Bit for bit, the militarized computer must possess all 
the instructions and non-instruction al features of the 
commercial machine, including input-output features, 
with the possible exception of privileged instructions 
usable only by the resident executive program; even 
here the exceptions must be few or else an entirely 
new executive will be required. 

On long missions, especially when critical command 
data are being handled, the military user must have 
assurance that a certain minimum capability will 
always be available. Even with the best modern 
technology it is prohibitively costly to provide assured 
availability in a single-thread system. The classical 
method of coping with failure — complete duplication 
of the hardware, with a stand-by unit for every unit 
in active operation — is also unduly expensive. In most 
applications there are peak loads which occur rela- 
tively rarely, but which must be within the capacity 
of the system in its normal state, and the minimum 
essential capability is substantially less than the peak. 
What is called for is a fail-soft approach in which 
major components are duplicated but not allowed to 
remain idle. All components are used simultaneously 
to obtain the peak throughput, but the system can 
continue operation at reduced throughput in case of 
a failure. The failed component can be diagnosed and 
repaired without interrupting the operation of the 
surviving portions of the system and in a time short 
compared to the expected time to failure of the identi- 
cal surviving component. Thus the user has nearly 
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complete assurance against 
system. 4 

The multiprocessor hardware 


collapse of the entire 


A data-processing system capable of graceful degra- 
dation is illustrated in Figure 11 Clearly, many other 
types of peripheral equipment could be included. All 
the peripheral control units are co-channelled, so that 
if one input-output section of the central computer 
should fail, another path would remain open. 

The central computer, the Model 215 multiprocessor, 
is shown in more detail in Figure 2. It consists of two 
Central Processor Units (CPU), two Input-Output 
Units (IOU), and from two to j eight Main Memory 
Units, interconnected by an essentially passive Signal 
Distribution Unit (SDU). By a conceptually simple 
redesign of the SDU, requiring, however, substantially 
more hardware, the system could be expanded to in- 
clude four CPU’s, four IOU’s, and sixteen Main Memo- 
ry Units. Each of the active units; is separately powered 
and operates independently of other units of the same 
type — for instance, any number !of memories can exe- 
cute independent, overlapped cycles simultaneously. 
The SDU is merely a mechanical package housing the 
interconnections among the active units; as the dia- 
gram suggests, such circuits ad it contains (largely 
line-drivers and receivers) are partitioned and powered 
from the active units. The logical! and electrical designs 
conform to the constraint that aj failure in any active 
unit, or in its partition of the SDU, must not interfere 
with continued operation of the remainder of the 
system. Multiprocessors for ground-based application 


rs 


• DISPLAYS AND 
KEYBOARDS 





TAPES, PRINTERS, — J 
AND OTHER 
PERIPHERAL DEVICES 


Figure 1 — Typical multiprocessor application 



Figure 2 — Fail-soft computer configuration 


similar in many respects to this one have been previ- 
ously described. 6 - 6 

If one IOU and one CPU are turned off or discon- 
nected, the uniprocessing system that remains is func- 
tionally compatible with the RCA Spectra 70 se- 
ries of commercial computers. 7 * 8 The entire instruction 
set of the Spectra 70, including privileged instructions, 
is contained within the Model 215 as well as the four 
Program States, the input-output channel control, the 
interrupt management scheme, and all other features 
of the commercial counterpart. This paper describes 
some of the added instructions and other features 
which make it possible to operate both CPU’s and 
both IOU’s together, while retaining the Spectra 
compatibility in the sense that any user program 
compiled and debugged on a Spectra 70 will run 
identically on the Model 215. 

Either IOU can be commanded from either CPU, 
the choice depending only on the channel number desig- 
nated in the input-output instruction, and either 
CPU can accept interrupts generated by any peripheral 
device. Except for the few microseconds when it is 
actually receiving a command, an IOU operates com- 
pletely independently, transferring data between pe- 
ripheral devices and memory without disturbing the 
CPU’s. 

Each memory unit contains 16,384 32-bit words 
and performs a read-write cycle in 1.65 microseconds. 
Input-output data rates are approximately 400,000 
bytes per second for a multiplexor channel in the 
multiplex mode and 800,000 bytes per second for a 
selector channel. Each CPU executes short instructions 
at memory speed — for instance, an indexed add in- 
struction in 3.30 microseconds — with single-precision 
multiplication in 9.0 microseconds. Except when the 
programs being executed in the two CPU’s happen to 
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share a memory bank, the total system throughput is 
twice as great as for a single CPU and IOU. The 
entire configuration shown in Figure 2, with eight 
memory units, occupies approximately 20 cubic feet, 
weighs approximately 1000 pounds, and consumes 
approximately 3550 watts. It is designed to meet the 
requirements of MIL-E-5400, Class I, the basic speci- 
fication for airborne electronics. 

The integrated circuits and medium-scale arrays 
used in the Model 215 are of military quality, with 
burn-in and screening. When the standard degradation 
factor for airborne application (five to one) is applied, 
the calculated mean time to failure of a unit (including 
an associated partition of the SDU) is 2200 hours for 
a Memory Unit, 1305 hours for a CPU, and 1820 hours 
for an IOU with a maximum channel capacity. The 
only single-thread element in the entire computer is 
the master oscillator, which is located in the SDU and 
redundantly powered by both CPU power supplies and 
which has an airborne MTBF of 100,000 hours. Use 
of redundant oscillators with a voting circuit was 
considered, but the calculated failure rate of the single- 
thread portion of the voting circuit turned out to be 
higher than that of a single oscillator. 

The purpose of fail-soft features is to permit an 
airborne mission to be completed successfully, even 
though portions of the hardware might fail. For a 
72-hour mission which can be successfully completed 
with the minimum capability provided by one CPU, 
one IOU, and six out of eight Main Memory Units, 
the probability of successful completion is greater than 
99 percent, even if no repair is possible during the 
mission. 

Recently there has been a great deal of activity 
aimed at producing concepts for software, and for the 
supporting hardware, which will facilitate parallel 
processing of tasks and portions of tasks. 9 - 10 - 11 In one 
respect, the present work is much less ambitious, as 
no special hardware has been incorporated for the 
purpose of forking and joining parallel processes within 
a user program. In other respects, however, it is more 
ambitious, in that a rigorous attempt has been made 
to avoid single-thread hardware of any kind, and in 
the commitment to achieve the goals of multiprocessing 
and graceful degradation while capturing a complete 
software package designed for a family of computers 
not having these features. Certain aspects of the ex- 
ecutive software require special attention in order to 
realize the benefits of multi-processing, and the hard- 
ware must be designed from the outset with these 
requirements in mind. In normal operation the most 
significant topics are the control of the executive, the 


management of input-output interrupts, the assign- 
ment of CPU's to working programs, and initial load- 
ing. Problem recovery and self -diagnosis are the critical 
aspects of graceful degradation. 

Executive control 

In order to preserve the purity * of the fail -soft 
features, the Model 215 hardware is completely sym- 
metrical, with both CPU's and both IOU's identical. 
It is therefore desirable for the executive program to 
avoid creating a master-slave relation between the 
CPU's. In fact, it is convenient to regard both CPU's 
as slaves, with the resident executive program as the 
master. Because programs checked out on the com- 
mercial support computer must run correctly on the 
Model 215, the executive must present the user program 
with precisely the same interface as the executive in 
the support computer and must allocate resources and 
manage input-output in a unified way for all programs. 
Because the executive supports the user programs in 
many complex ways, the multi-processing executive 
should be derived with minimum possible modification 
from an existing, commercial executive, so as to intro- 
duce the least possible chance of mismatches at the 
interface. The Tape-Disc Operating System 12 of the 
Spectra 70 is an excellent example of a software system 
which has desirable characteristics for military data- 
management and which contains a multi-programming 
executive readily adaptable for multi-processing. 

The executive program must be protected from 
having its coding executed by two CPU's at once. 
Even if the coding were parallel reentrant, there are 
common tables carrying the status of programs and 
of input-output devices which can become garbled if 
two CPU's are allowed access at the same time. The 
executive can be partitioned into independent sub- 
routines, so that different portions may be executed 
simultaneously; nevertheless, the system must provide 
means for one CPU to lock the other one out of the 
subroutine it is currently executing. 

An ideal mechanism for locking out a CPU is the 
Test and Set instruction. 13 - 14 This instruction tests a 
specified byte in core memory and simultaneously sets 
it to all ones. If two CPU's attempt to execute this 
instruction on the same byte at the same time, the 
one having higher priority for access to memory will 
perform the operation first; if the tested byte was 
initially not set, the higher priority CPU will detect 
that fact, but the other CPU will not have access to 
the test byte until after the byte has been altered. 

Figure 3 is a flow chart showing how the Test and 
Set instruction is used for executive lockout. The 
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initial instruction of each independent subroutine is 
a Test and Set, followed by a Branch to direct the CPU 
in accordance with the previous Setting of the test byte. 
There is a unique test byte for leach independent sub- 
routine. If the test byte was hot originally set, the 
coding is available for execution and the program 
proceeds normally. Another CP|J arriving at the Test 
and Set instruction immediately thereafter will find 
the test byte set and will brarich to the Timed Idle 
instruction. The first CPU meanwhile completes the 
disputed section of coding, restores the test byte to 
its original form, and then executes an instruction to 
terminate the idle condition of the other CPU. This 
privileged instruction, a special j feature of the Model 
215, causes a signal to be sent |out to terminate any 
Timed Idle instruction then being executed. The idling 
CPU then returns to the Test afid Set instruction and 
proceeds into the previously locked-out subroutine. 

If the active CPU fails in such a way as to go into 
a loop, the idling CPU will neher receive a signal to 
terminate the idle mode. The time-out feature is 
provided in order to prevent “silent death” in this 
case. The duration of the idle piode can be adjusted 
in accordance with the expected time to execute the 
subroutine being protected. The! idle mode terminates 
with a unique condition code if it times out instead of 
being terminated by a signal from the other CPU; 
the executive can then record the suspicion of a mal- 
function in the other CPU before continuing with 
normal processing or going to an error recovery routine. 

Input-output interrupts 

In the Model 215 either CPU must be able to com- 
mand peripheral devices through either Input-Output 
Unit. A user program may be executed partly by one 
CPU and partly by the other; yet it must be able to 
have access to peripheral devices on any channels, 
whichever Input- Output Unit I may contain those 
channels. Certainly in case of CPU failure the surviving 
CPU must have access to all channels. When an input- 
output operation is completed, frie peripheral device 
generates an interrupt signal calling for CPU action. 
In general the program which called for the input- 
output operation may have been forced to wait for 
its completion, and at the time of the termination 
interrupt no CPU will be executing it; thus, there is 
little reason to tie the servicing of a termination inter- 
rupt to the CPU which initiated the input-output 
action. Again, the requirement for continued system 
operation in the face of failures ! demands that either 
CPU be able to respond to an interrupt coming from 
either Input-Output Unit. 16 
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Figure 3 — Executive lockout 

The processing of an input-output interrupt re- 
quires action from both a CPU and an Input-Output 
Unit. The CPU must execute a portion of the executive 
which maintains tables carrying the status of pe- 
ripheral devices and which updates the readiness status 
of the affected program. In order to do this, the ex- 
ecutive needs information on the status of the pe- 
ripheral device in question and of the channel through 
which it worked. At the time the interrupt is taken 
this information is obtained by the Input-Output 
Unit on command from the CPU and is stored in 
addressable registers in the Input-Output Unit. The 
executive gains access to the information by a Store 
Scratchpad instruction addressing the registers of the 
appropriate channel. Significant deviations from this 
pattern are prohibited by the requirements of com- 
patibility. 

Although either CPU must have the ability to re- 
spond to any input-output interrupt, clearly both 
CPU's must not respond to the same interrupt. For 
one thing, duplicate updating of the executive tables 
would be improper; although this could be prevented 
by judicious use of the Test and Set instruction, a great 
deal of time would be wasted in unnecessary house- 
keeping. Moreover, in the very act of accepting the 
interrupt the CPU automatically commands the Input- 
Output Unit to obtain and store the device status 
information, and this cannot be done twice for the 





Compatible Airborne multiprocessor 351 


same interrupt without losing data. Flexibility of 
response must be achieved without duplication of 
activity. 

In the Spectra 70, with which the Model 215 is 
compatible, interrupt signals reported to the CPU are 
stored in an Interrupt Flag Register. The Interrupt 
Flag Register contains a bit for each input-output 
channel, plus some additional bits for interrupts not 
related to input-output. There is also an Interrupt 
Mask Register, controllable by the executive program 
by means of privileged instructions; the contents of 
the Interrupt Mask Register designate specific inter- 
rupts to be taken as soon as they are requested and 
others to be held until it is convenient for the ex- 
ecutive program to respond. Normally, all input- output 
interrupts are taken promptly unless the executive is 
already responding to an interrupt. 

The Model 215 requires a software convention to 
establish which interrupts will be taken by each CPU. 
This convention may be arbitrary, so long as each 
input-output interrupt is taken by one and only one 
CPU. An example of such a convention is illustrated 
in Figure 4. The executive program sets the Interrupt 
Mask Register of each CPU to take the interrupts 
designated for that CPU, and the hardware reports 
all input-output interrupt signals to both CPU's. 

When an interrupt is taken by the designated CPU, 
the corresponding bit in the Interrupt Flag Registers 
of both CPU's is reset. At all times both Interrupt 
Flag Registers contain a record of all the pending 
interrupts. In case of failure of one CPU, the executive 
program, as part of the recovery process, can alter the 
Interrupt Mask Register of the surviving CPU so as 
to permit that CPU to take all input-output inter- 
rupts. 

In the simplest case the Interrupt Mask Registers 
of both CPU's would be given permanent settings 
when the system is initially loaded, and these settings 
would only be altered in case of a CPU failure. If 
desired, however, the executive program can adjust 
the Interrupt Masks of the two CPU's in accordance 
with the priorities of the programs being executed, so 
that high priority programs are not interrupted. 

After an interrupt has been taken and the Input- 
Output Unit has stored the device status information 
in the appropriate channel registers, the contents of 
these registers must be protected until the CPU taking 
the interrupt has performed enough analysis to iden- 
tify the registers to be stored and to execute the 
instruction to store them. In the meantime, the other 
CPU, executing a different subroutine in the executive, 
may attempt to start a peripheral device on the same 



Figure 4 — Input-output control block diagram 


channel (or subchannel in the case of multiplexed de- 
vices). This could destroy the information related to 
the interrupt. An interlock is therefore incorporated, 
preventing the other CPU from executing a Start De- 
vice instruction until the channel registers have been 
stored. The interlock carries a time-out feature, in 
case the CPU taking the interrupt goes into a loop 
before storing the channel registers; if the interlock 
times out, the interlocked CPU is informed by a special 
interrupt that a malfunction may have occurred. A 
similar situation arises when an attempt to start a 
peripheral device fails because of a malfunction or 
other peculiar behavior of the device; status infor- 
mation is again stored in the channel registers and 
must be protected from destruction by the other CPU, 
and the same interlock is invoked. 

Load balancing 

When the Model 215 is operating normally, two or 
more object programs, as well as the executive, are 
resident in core, and each CPU is executing one of 
them, independently of the other. From time to time 
one of the CPU’s may be diverted by an interrupt 
from the processing of an object program; it will then 
spend some time executing a portion of the executive. 
When the interrupt processing is completed, the inter- 
rupted CPU should return to processing an object 
program. The executive must insure that each CPU 
has productive work to do, and that the highest pri- 
ority object programs are being processed. 

An analogous situation arises in a computer without 
a multi-processing feature if the executive is capable 
of multi-programming. Here again there are several 
resident object programs in addition to the executive. 
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At a given time, some of these programs may be forced 
to wait for the completion of input-output, for loading 
of additional program segments:, or for other special 
action on the part of the executive. It is the responsi- 
bility of the executive to ensure; that the highest pri- 
ority program capable of running is entered whenever 
the executive itself does not require the use of the CPU. 

The normal flow of a multi-programming executive 
is suggested by the flow chart of Figure 5. In the steady 
state the CPU executes an object program until an 
interrupt is taken. The interruption may be caused by 
the object program itself — typically a Supervisor Call 
to start an input-output operation — or by an external 
agency, as in the case of a termination interrupt from 
a peripheral device. In the first case* the program inter- 
rupted may be unable to continue until some action 
initiated by the executive has been completed; in the 
second case, the processing of the interruption may 
change some other program froih a waiting status to 
the status of being ready to run. 

The multi-programming executive maintains an 
Operation List, a table of vital information about each 
program resident in core. In the Operation List for 
each such program there is a set of flags indicating 
the reason why a particular program is not ready to 
run. The flag bits are updated, as appropriate, by the 
processing of interrupts which affect the readiness of 
programs. If no flag is set, the; program in question 
is ready, and the executive, wheri it has completed its 
other processing, exits by giving; control of the CPU/ 
to the highest priority program that is ready. At that 
time the executive stores the identity of this program 
in the table of Current Operations, to establish an 
information trail in preparation for subsequent inter- 
rupts. 

This executive exit structure can be converted into 
a form suitable for multi-processing control by two 
simple changes. An additional flag bit is needed in 
the Operation List for each program; this bit tells the 
executive exit that the program is not only ready but 
is already being executed, and is updated each time a 
program is activated or interrupted. In the Spectra 
executive an unused bit is available for the purpose 
in the format of the flag byte in the Operation List. 
The other change is to carry an entry in the table of 
Current Operations for each CPU in the system. When- 
ever a CPU passes through the executive exit, the 
appropriate entry is updated. The only special hard- 
ware needed is a means for the otherwise identical 
CPU’s to identify themselves to the executive, so that 
the proper entry in the table of Current Operations 
may be used. This means is provided by an instruction 
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figure 5 — -Load balancing 


which stores the CPU identity, derived from its plug- 
in position at the SDU, in a specified General Register. 

The multi-processing modification to the executive 
exit can be extended to handle as many CPU’s as 
desired merely by adding to the size of the table of 
Current Operations; additional flag bits in the Oper- 
ation List are not needed. The instruction permitting 
a CPU to identify itself must, of course, store enough 
bits to identify the number of CPU’s in the system. 

A different sort of refinement permits the executive 
always to maintain all the CPU’s at work on the highest 
priority programs that are ready, even if the CPU 
taking an interrupt, and therefore passing through the 
executive exit, was itself executing a high-priority 
program when interrupted. Suppose, for instance, that 
three programs are resident. The highest priority 
program is waiting for the completion of an input- 
output operation, and when the termination occurs, 
the CPU executing the higher priority of the two re- 
maining programs takes the interrupt. In the normal 
course of events, the highest priority program would 
be designated as ready, and the CPU which took the 
interrupt would begin to execute it as soon as interrupt 
processing was completed. The result would be that 
the highest and lowest priority programs would be 
running, while the one with intermediate priority would 
be ready but would have to w'ait for a CPU to become 
available. 

For reasons connected with the requirement that 
system processing continue in the face of failure of a 
CPU, so long as one CPU survives, it is necessary for 
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the system to incorporate an instruction permitting 
one CPU to gain the attention of the other. This 
privileged instruction causes an interrupt in the CPU 
being signaled. Using this signal, the executive can 
examine the priorities of all the programs currently 
being executed, and can interrupt a CPU which is 
executing a program lower in priority than some 
program not being executed at the time. A situation 
calling for such examination can only arise as a result 
of processing an interrupt; so provision for this ex- 
tended priority surveillance would be a natural exten- 
sion to the multi-processing modifications to the ex- 
ecutive exit. 

Initial program loading 

Initial loading for a multi-processor differs little 
from initial loading of a conventional computer having 
a multi-programming executive. The operator's console 
has provision for selecting the device and channel 
through which the bootstrap routine will be loaded. A 
selected CPU is then started at a fixed address, where 
it encounters coding that initializes its General Reg- 
isters, Interrupt Mask Register, etc. Next the resident 
executive is loaded, followed by a number of object 
programs appropriate to the amount of core memory 
available. For multi-processing to be effective, there 
must be at least two resident object programs; and to 
avoid inefficiency caused by waiting for input-output 
terminations, preferably three or four. When the memo- 
ry is loaded, the selected CPU initializes the program 
counter of the other CPU and commands it to start. 
One after another, the CPU's execute the normal coding 
of a multi-programming executive to commence exe- 
cution of object programs. The extra flag bits in the 
operation list ensure that only one CPU will execute 
a given object program. 

The hardware requirements attributable to the multi- 
processor configuration are the provision for selecting 
one of the CPU's to execute the bootstrap coding, 
means for enabling one CPU to set the program counter 
of the other and command it to start, and means for 
a CPU to identify itself to the executive so that the 
proper Interrupt Masks may be established. 

Problem recovery 

In order for any system to continue in operation 
after a failure in one of its units, there must be sufficient 
checking built into the hardware. A failure must be 
detected before it has had a chance to propagate errors 
far into the problem being solved at the time of failure. 
The Model 215 Main Memory Units are checked by 


the usual byte parity on the data, and by a parity 
bit in the key memory which forms part of the memory 
protection feature. The Input-Output Units employ 
parity checking on data and on command words. In 
the CPU's the scratch-pad memory and the elementary 
operations stored in the read-only memory each con- 
tain parity bits. The arithmetic unit is two words 
wide, to speed up byte-oriented and double-precision 
instructions and to simplify their control; when oper- 
ating on single-precision data, the two halves of the 
arithmetic unit work in parallel and the results are 
compared. Whenever possible, data transmitted be- 
tween units has a parity check at both ends of the 
transmission path to facilitate diagnosis of faults. A 
small number of special checking circuits are incorpo- 
rated to check for faults not detectable by parity check- 
ing or arithmetic comparison. 

Another requirement for continuing operation is the 
nreservation of enough information to permit resump- 
tion of the program by the surviving units. The neces- 
sary information, if it has not been destroyed by the 
failure itself, will be found partly in memory and 
partly iri various processing registers. In general, this 
information cannot be made available to an operator 
without the execution of some operations by the 
surviving processors; in other words, purely manual 
recovery is impossible. Since the hardware and soft- 
ware must include the capability of retrieving and 
identifying the information needed to continue pro- 
cessing, it is only a short step to providing for com- 
pletely automatic problem recovery in the majority 
of cases. 

Failures in the Input-Output Units and the pe- 
ripheral equipment present no problems unique to 
multi-processing. Data-transfer operations may be re- 
tried under program control. If the failure persists, 
or if the error information stored in the Channel 
Status byte indicates that the failure is in the Input- 
Output Unit rather than in the peripheral equipment, 
the peripheral device may be switched to a channel 
on the other Input-Output Unit; in the case of co- 
channelled devices the switching consists merely of 
making an entry in an executive table. 

When a failure occurs in a CPU, either of two modes 
of operation illustrated in Figures 6 and 7 may be 
followed. Normally there is a surviving CPU which 
can come to the rescue. In this mode the failed CPU 
stops dead as soon as the failure is detected, thus 
preserving the contents of its processing registers for 
problem recovery and fault diagnosis. In the act of 
stopping it sends a signal to the surviving CPU; this 
signal causes an interrupt into Program State P-4, the 
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normal state for responding to machine failure. It 
also provides an indication, accessible to the program, 
that the reason for the interrupt was a CPU failure 
not in the CPU taking the interrupt. After decoding 
the reason for the interrupt, the surviving CPU can 
read out selected processing registers of the failed CPU 
and determine its status. One bit available in this 
way indicates whether or not the failed CPU had 
written into its scratchpad or into Main Memory 
before stopping. If not, all the data needed to repeat 
the instruction in progress at the time of failure is 
still available, and the program can be continued from 
that point. If data have been destroyed, the program 
must be returned to a restart point, exactly as in a 
conventional computer, but with the advantage that 
the restart can be initiated without human intervention 
if desired. 

A second mode of operation is available to deal with 
a failure in the sole surviving CPU. Such failures will 
be rare, since the mean time to repair is very small 
compared to the mean time between failures; never- 
theless, because a large proportion of errors in any 
computer are normally transient, it is well to allow 
the CPU to attempt its own recovery. In this mode, 
which the executive can establish when its configuration 
table reveals that a particular CPU is the sole survivor, 
the failed CPU takes an interrupt to Program State 
P-4 and continues processing. If the error was transient, 
the CPU will be able to decode the interrupt and will 
then determine whether the program can be resumed 
or must be restarted. If, on the other hand, the failure 
is solid, the CPU will commit another error. This time, 
since the error indication is a failure of the CPU while 
in Program State P-4, the CPU stops dead, just as it 
does in the normal mode. Naturally, the problem 
recovery portion of the executive must not contain 
double-precision instructions whose arithmetic is not 
checked. 

The flow charts of Figures 6 and 7 present condensed 
pictures of this process. The shaded boxes represent 
actions taken automatically by the hardware; the other 
boxes represent executive program action. Special in- 
structions are provided for the purposes indicated in 
the unshaded boxes with heavy borders. The dotted 
lines represent direct wire connections between the 
two CPU’s. In the normal mode, the surviving CPU 
takes control of the entire process as soon as an error 
is detected. It may enter a diagnostic routine after 
logging out the registers of the failed CPU; alterna- 
tively, the problem recovery routine may simply record 
the failure, reinitialize the registers of the failed CPU 
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and turn it back on, in -the hope that the error was 
transient. In either case, the surviving CPU deter- 
mines whether the program can be resumed or must 
be restarted, sets the program entry point accordingly, 
and updates the Operation List flags for that program, 
so that it can be taken up by some CPU in accordance 
with its priority. 

In the other mode, of course, the failed CPU at- 
tempts to do all this for itself. The CPU that did not 
fail can observe the progress of the failed CPU by 
monitoring appropriate locations in the executive 
portion of memory; if these locations are not being 
properly updated, the failed CPU can be stopped and 
the processing used in the normal mode can be under- 
taken. 

Failure in a main memory unit is detected as a 
parity error and is reported to the CPU or Input- 
Output Unit currently working with the failed memory 
unit. An Input-Output Unit stores the error indi- 
cation in the Channel Status Byte and treats it exactly 
as a Spectra 70 does. If the parity error occurred during 
execution of an instruction, the associated CPU 
treats the error as illustrated in Figure 8. The other 
CPU is no l notified, since the CPU receiving the error 
indication is presumably operable. 

As in the Spectra 70, the CPU takes an interrupt 
to P-4. After decoding data stored automatically in 
taking the interrupt, and determining that the cause 
was a memory parity error, the executive locates the 
instruction which caused the error. The Program 
Counter and Instruction Length Counter provide the 
necessary information. By means of a special in- 
struction — Check Parity — the parity error is localized 
to the instruction word or an operand. The Check 
Parity instruction moves data by words, correcting 
the parity of any error byte it encounters. The in- 
struction terminates automatically if a parity error 
is encountered and identifies the location of the error 
byte. It also indicates by a condition code whether or 
not it was completed without a parity error, 'but it 
does not cause an interrupt if a parity error occurs. 

Having located the parity error, the executive can 
determine whether its own memory bank is affected. 
If the error is not in the executive bank, then problem 
recovery concerns only a single user program. This 
program can be returned to a restart point in much 
the same way as in a conventional computer. The 
affected memory bank may be tested by a diagnos- 
tic program, and if the error does not appear to be 
transient, the bank may be removed from the system. 
Compatibility with existing software requires that the 
available memory be organized with consecutive ad- 



Figure 8 — -Executive parity error 


dresses, although obviously with a failed bank the 
total amount of memory is reduced. The Model 215 
has a special instruction which permits the executive 
to reassign memory bank addresses. In this w T ay, the 
system can continue to process as many programs as 
it can fit into surviving memory, regardless of which 
bank fails. 

If, on the other hand, failure occurs in the memory 
bank containing the executive, special features are 
required in order to prevent the collapse of the entire 
system. The goal for the Model 215 is to confine the 
problem to the one program being serviced by the 
executive at the time of the failure. When the ex- 
ecutive bank has failed, it may be impossible to exe- 
cute even a single instruction after the interrupt is 
taken, and so the interrupt cannot be decoded. If this 
occurs in the Model 215, the hardware will automati- 
cally take a branch to the contents of a fixed General 
Register of Program State P-4. This branch wfill 
normally point to a Recovery Nucleus located in a 
different memory bank. If still another parity error 
should occur, it w T ouid be an indication that the failure 
w r as really in the CPU, and appropriate action would 
be taken. 

The Recovery Nucleus, Figure 9, consists of copies 
of critical, dynamic executive tables, plus enough 
coding to load a fresh copy of the executive into a 
surviving memory bank. The Recovery Nucleus con- 
tains configuration tables showing which units are still 
operable. After identifying the memory bank into 
which it will load the executive, the Recovery Nucleus 
must quiet all peripheral devices in order to prevent 
accidental overlaying of the new executive as it is 
loaded. A special instruction is provided so that pe- 
ripheral devices may be quieted without destroying 
data or taking the termination interrupts. 
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Figure 9 — Recovery nucleus 


If there is an interrupt on the device which carries 
the backup executive, that one interrupt must be 
taken. The current contents of the memory bank which 
is to accept the new executive are dumped, in order to 
avoid losing any programs whichl use it, and the new 
executive is loaded. A special feature of the Start 
Device instruction allows the Channel Address Word 
to be taken from any bank. The failed bank is taken off 
line, and memory banks may be readdressed if re- 
quired. After updating the configuration tables and 
transferring information from thje Recovery Nucleus 
tables to the new executive tablfes, normal operation 
can be resumed. 


Self-diagnosis 


On missions of long duration, the ability to locate 
and repair faults during the mission is necessary in 
order to give high assurance that the minimum essential 
capability will always be available. 

The fail-soft features of the Model 215 rest on the 
assumption that two failures do not occur simulta- 
neously — that is, that a second failure does not occur 
before the first failure has been repaired. The mean 
time to repair of the Model 215 will be less than thirty 
minutes. To make this possible, without requiring high 
skill levels, the Model 215 is packaged and powered 
so that a unit may be repaired j while the system is 
operating, the logic is partitioned into functional, re- 
placeable cards, and there is provision for automatic 
fault diagnosis. 

Ease of access and functional partitioning are matters 
of straightforward engineering design. Likewise, the 
software techniques for automatic diagnosis of memory 
failures are w^ell-known and require no special hard- 
ware. Fault isolation in an Input-Output Unit does 
require some special hardware, chiefly to isolate pe- 
ripheral equipment from the fafilty unit so that it 
can be used with the surviving Input-Output Unit, 


and a means of simulating signals from the peripheral 
devices in order to exercise the unit to be diagnosed. 

The real challenge is CPU failure. Even here, with 
well thought out functional partitioning, isolation of a 
fault to one or two functional cards is not too difficult 
if the processor can run a program. Unfortunately, the 
“hard core” of equipment that must be operable in 
order for a diagnostic program to run can amount to 
as much as 70 percent of the CPU. Diagnosing the 
hard core requires some form of external stimulus- 
and-measurement equipment. In the Model 215 the 
stimulus-and-measurement equipment for a failed 
CPU is the other CPU. 

In the normal mode of operation, as illustrated in 
Figure 6, detection of an error (except for a Main 
Memory Parity Error) causes the offending CPU to 
stop immediately, alerting the other CPU to the trouble. 
The failed CPU stops as soon as the error is detected, 
so that the contents of its scratchpad memory and its 
processing registers are undisturbed. By means of a 
small number of direct-wire connections between 
CPU's, the surviving CPU can gain access to the pre- 
served information. It then adopts the so-called “start 
small” technique of diagnosis. 16 The failed CPU can 
be commanded to put the contents of its memory 
buffer register onto the main inter-unit bus for ex- 
amination by the survivor and for storage in core. 
Data can also be written into the memory buffer. By 
cycling a few bit patterns through the memory buffer 
the diagnostic program can localize a fault within the 
memory buffer or can determine that the memory 
buffer is operable. In the first case the problem is 
solved; in the second case the memory buffer can be 
used as a dependable tool to explore further into the 
failed processor's hard core. 

Other direct connections allow the contents of the 
read-only control memory to be read out to core for 
checking and also permit the surviving processor to 
command the failed processor to execute specific ele- 
mentary operations. In this way the contents of the 
processing registers can be examined and checked. In 
some cases the failure can be identified at once. For 
instance, if the error was in the arithmetic of a single 
precision operation, it will have been caught by the 
comparison check between the two halves of the 
arithmetic unit. The operation can be simulated step- 
by-step on the surviving CPU, while the failed CPU 
is driven through the operation by individually acti- 
vated elementary operations. When the simulation 
differs from the actual operation in one or the other 
half of the failed arithmetic unit, the simulation pro- 
gram can identifv the bit position in which the fault 










Compatible Airborne multiprocessor 357 


originated. Since the arithmetic unit is partitioned 
into four-bit slices, with all the arithmetic registers on 
the same card, the proper card to replace is obvious at 
once, and the operator can be notified . 

If all the registers and transfer paths of the failed 
CPU are verified by this step-by-step exercise without 
locating the trouble, the hard core can be assumed to 
be working, and the failed CPU can be commanded to 
execute its own diagnostic programs. This will frequent- 
ly occur when the failure is intermittent or data- 
sensitive. Considerable field experience will be needed 
in order to determine the optimum point for turning 
over the fault-isolation process to the failed processor. 
Putting the failed processor back on its feet, either for 
executing its own diagnostic routines or following re- 
pair, requires setting the Program Counter and com- 
manding the processor to start executing instructions. 
The presence of the first capability is implied by the 
diagnostic process of loading and reading processing 
registers; the second is already needed for initial pro- 
gram loading. 

Whether the failed processor or the survivor is doing 
the work, the diagnostic programs can be entered into 
the Operation List and executed as if they were user 
programs. In this way, the diagnostic process can be 
carried along on a time-sharing basis in parallel with 
mission data processing. By adjusting the priority of 
the diagnostic program, the operator or the executive 
program can react to the relative urgency of accomp- 
lishing specific mission tasks as compared with restoring 
full processing capacity. 

SUMMARY 

A multi-processor with independent Central Processors, 
Input- Output Units, and Memories can provide grace- 
ful degradation as well as sufficient compatibility with 
a commercial computer to capture its entire operating 
system. Computing load and the management of input- 
output operations are balanced without establishing 
a master-slave relationship between Central Processors. 
Either Central Processor can diagnose the other one. 
By a combination of special hardware and software 
features the system is able to continue operation in 
the presence of failures, including failures in the execu- 


tive bank of memory. 
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SUMMARY 

This paper discusses the dilemma posed by the promises 
made about large-scale integration, and the expecta- 
tions derived from the promises. Furthermore, it 
examines LSI’s present form. In some instances what 
have appeared to be “broken promises” are not in fact 
that at all. Some believers wanted to believe and thus 
have suffered from self-delusion. Some promises cer- 
tainly were unwise or premature, thus creating false 
impressions. But others represented a misunder- 
standing between component and system engineers. 

The paper both analyzes and interprets the history of 
the technical developments that led from transistors 
to integrated circuits to large-scale integration, describes 
some of the pressures that have led to premature 
promises, and characterizes the technology leading to 
LSI. Likewise, it looks at the introduction dates of 
systems, as well as of the components that have imple- 
mented such systems in the past. This review covers 
such integrated circuits as gates or amplifiers to LSI- 
type digital differential analyzers, which can perform 
a variety of calculations and logic, or, in another ex- 
ample drawn from the memory field, from shift 
registers to scratch pad and larger memories using 
semiconductor arrays. Likely near-term uses of LSI 
devices in new computer systems include the high- 
speed buffer memory in the 360 Model 85 (IBM), 1 and 
the MOS registers for some desk calculators (Auto- 
netics). 2 


In tracing the developments of the past by means 
of a few specific examples and then deriving broad 
generalizations, the paper ignores past mistakes to 
look at the optimum situation of realistic promises. 
Even there, however, one must compare the conflict- 
ing views on state of the art in components, such as 
LSI, to that in systems, such as computers. 

The system engineer’s state of the art is a system that 
has not been made before, and that may use both new 
and old components— though the new components 
must work in that system, should be reliable, and ought 
even to be in modest production; that is, they should 
have had some minimum of prior seasoning in develop- 
ment, application, manufacture, and use. In contrast, 
the component engineer’s state of the art is a building 
block never before accomplished, carrying with it the 
suggestion of new functional capbailities and applica- 
tions. Clearly* these two concepts of the state of the 
art differ radically from each other, causing misunder- 
standing even with realistic predictions. In the case of 
extravagant promises, these may later be considered 
misrepresentations if obstacles delay their realization 
as devices and in system use. 

Nevertheless, these views can be brought into har- 
mony with each other. The paper explains how each 
new concept, material, and component requires a 
certain time for development, application, and sea- 
soning before it is ready for a new system. Further- 
more, any new computer system using seasoned new 
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parts and components must itself be developed and 
tested before introduction to the marketplace as a 
reliable new computer rather than as an infant prodigy. 
The question of the immediate realization of the prom- 
ises of LSI — not only in prototype components but also 
in practical computer systems— is therefore closely 
related to the time delays of our conventional design 
sequences leading to system manufacture. 

Some device manufacturers havb made extravagant 
predictions regarding LSI without realistic qualifica- 
tions. Some qualifications have been eagerly ignored 
or have been misinterpreted by prospective users 
or system manufacturers. The acceptance of LSI in 
computer systems depends as much on a system 
designer’s skill, courage, and entrepreneurial ability 
as on the new inventions and developments of the 
component maker. These factors can only be wedded 
after a suitable “engagement and gestation” period, 
which for most new computers has not yet been com- 
pleted. Also costs and performance factors of competing 
technologies must be compared add discrepancies re- 
solved. A few examples of imminent system applica- 
tions will be cited. 

The paper also discusses how both LSI and computer 
manufacturers share a vision of thb future. The former 
predicts a great potential for computers; the latter, 
while remaining skeptical about ioday’s LSI accom- 
plishments, also shares this expectation. Readers, 
then, should obtain a better understanding of the proc- 
ess of assimilating new components into large systems. 
With this knowledge, one should | be able to predict 
technological progress without constantly feeling 
cheated by the amount of work required for its realiza- 
tion and utilization. The challenge is how to utilize 
the progress in technology that lias led to LSI and 
achieve the use of LSI in more capable computer sys- 
tems in the future. 

The promise of LSI 

Large-scale integration is derived from the extremely 
rapid evolution of the batch fabrication technology of 
silicon planar transistors. When scientists learned to 
fabricate hundreds and thousands: of transistors next 
to each other on a one-inch slice of! silicon — like minia- 
ture postage stamps — the idea presented itself of adding 
cross-connections and only separating them in blocks 
that contain all the interacting parts of a large gate or 
flipflop. As this became a reality |(integrated circuits 
are not yet ten years old) and engraving became even 
finer, and functions smaller, engineers found they 
could place hundreds of such blocks of functions on one 


(by now larger) slice of silicon. Again, the 
thought was obvious —how many of these might one 
be able to interconnect and leave together on one chip? 
Laboratory researchers continued to apply their in- 
genuity to provide first multiple gates, then several 
flipflops, then whole shift registers or adders on one 
minute silicon chip; and by 1966 claimed 1000 active 
elements — about 100 bits — on one MOS -integrated 
circuit. 8 Soon one saw 16-bit bipolar memory circuits 
containing more than 100 active elements and tran- 
sistors (Figure 1). 

During the same time, electronic digital computers 
became even more complex. During the 1950’s, 1000 or 
more tubes, then perhaps 10,000 transistors, provided 
the computing power for data processors, and operated 
at microsecond speeds with magnetic core memories, 
containing similar numbers of cores and bits. By 1964 
however, the number of active elements in a computer 
reached 100,000, and some of the largest (CDC 6600) 
contained J^-l million diodes and transistors as 
switching devices and used even more cores in their 
core memories. The expectation was that, if economical- 
ly feasible, one million to ten million gates and switching 
elements operating at nanosecond speed would be re- 
quired to provide the computing power desired for the 
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Figure 1 — Complexity of integrated circuits versus ve>ar 
of laboratory accomplishment 
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largest machines 4 and their memories (Figure 2). There 
was much discussion on how this was to be achieved 
economically, practically, and without an unreasonably 
large effort in component assembly. 

As a result, computers were ready for integrated 
circuits — and they are now eagerly utilizing each 
generation of more complex ones, as each of these in 
turn offers acceptable performance, ever higher speeds, 
lower cost per device, and greater packing density. 
The expectations for large-scale integration have been 
derived from various pronouncements made by device 
makers as early as 1964, and in the several succeeding 
years. Initially, a greater pervasiveness of integrated 
electronics was proposed. 6 There followed a number of 
laboratory investigations of complex integrated circuits 
and extrapolations of their characteristics were pub- 
lished. 6 ' 7 - 8 These were quickly followed by analysis 
of the potential advantages of LSI from the user’s 
standpoint, 9 - 10 analyses of computer organization 
architecture and partitioning, 11 * 12 as well as tempered 
discussions of possible areas of utilization 13 and cost. 14 
Computer architecture has developed that permits 
interaction and utilization of large blocks of com- 
ponents — i.e., subsystems — without delineating all com- 
binations of signals and their paths one by one. Thus, 
computer theory is capable of dealing with large-scale 
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Figure 2 — Functional complexity of electronic computers 


integrated circuits, and engineers examine all new of- 
ferings of component manufacturers to assess their 
suitability for one or another potential application. 

On the other hand, such complex subsystems as an 
LSI chip must embody far more thought and care in 
design 16 than a simple gate circuit. The LSI chip must 
contain more than just a repetition and interconnection 
of dozens of simple integrated gate circuits. Jn the past, 
subsystems of discrete components also had to be tested, 
modified, remeasured, and remodified many times 
before they were ready for use in a large complex 
computer. That represents a significant change from 
early days, in which a transistor — -if it had enough 
sustaining and saturation voltage, gain, and switching 
speed at a given cost, was considered satisfactory for 
a new generation of transistorized computers. At that 
time, it may have required one year to shake down a 
transistor in a new logic circuit and three to five years 
to develop the rest of a complex system — or two years 
to develop the concepts of integrated circuits, with 
two to four years to complete the system. 16 It might 
now take three years to shake down LSI ideas^ and 
another one to three years to complete the system 
using them. This accounts not only for the develop- 
ment times required for a given product but for the 
total time required for developing the subsystem con- 
cepts and configurations, and adapting these to the 
newly conceived systems. 

From such considerations various authors have de- 
rived these expectations 6-14 for LSI circuits— 

• Much more complex functions— logic, memory, 
or other — on a single chip or a single package. 

.Very low cost per elementary function or per bit. 

• Far smaller size and relatively few connecting 
leads than present computer circuits using inte- 
grated circuits on printed circuit cards. 

• Complete circuit compatibility with other semi- 
conductor active devices. 

• Off-the-shelf circuits or at least readily designed 
custom circuits, available with the strokes of a 
computer-controlled mask generator. 

• A silicon device factory operated like a “Brownie” 
photoprint shop: put in a negative and out conies 
a ten-cent deckle-edged glossy print. 

With such great expectations, it is not surprising 
that many predictions and promises were made by the 
device manufacturers. And many — -even the more 
extravagant — promises were believed. Most promises 
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made by device manufacturers were based on the 
concept that a further aggregation of existing standard 
logic or memory circuits would be sufficient to fulfill 
such promises. Little did they anticipate that much 
new technology had to be developed in order to fulfill 
simultaneously all or most of the above expectations 
on cost, ease of design, compact packaging, and so on 
.which had been individually predicted and promised. 
Furthermore, computer makers have scaled up their 
demands and expectations, and are attempting to 
clarify the technical and interface requirements on 
purchased subsystems — and a subsystem is what 
LSI circuits really are. 

Yet the makers of peripheral equipment, like dis- 
plays or desk calculators, or of small memory buffers, 
are close to the realization of such promises, and are 
probably within a year of producing the equipment 
based on the expectations for LSI circuits and the 
promises of their vendors. The interaction between 
vendor and user of LSI circuits is less time-consuming 
for peripheral equipment systems which are much 
less complex than large computers. 

The accomplishments of LSI 

Devices 

At this point, it may prove instructive to look at 
some of the accomplishments of the semiconductor in- 
dustry in more detail, from the invention of the transis- 
tor in 1948 to the complex circuits of the present 
time. 17 ” 22 

Table I illustrates some key events in the steady 
progression of innovations utilized by the computer 
industry. The table shows not only the date of the 
laboratory announcement, but also the time (one to 
three years later) when such devices became available 
for purchase. Figure 3 presents this data in graphic form, 
plotting the circuit's complexity as a function of time. 
Note that, in addition to a delay in moving from the 
laboratory into first production, the mass production 
of silicon transistors really followed only after the 
planar process provided commercially useful devices 
at costs competitive with germanium transistors. This 
happened after 1960. 

Figure 3 also shows the complexity of the integrated 
circuits actually used in computers as a function of the 
system's introduction date — another year or two 
after, production of such circuits was in full swing and 
produced reliable units at reasonable cost. The hori- 
zontal spread in years between these curves is a mea- 
sure of the time required — again and again, one might 
add — to turn new concepts from the laboratory into 



Figure 3 — Twenty-year growth of complexity toward 
LSI 

producible devices, and finally into reliable devices 
manufactured in large numbers at low cost. 

The time span also indicates the time required for 
systems manufacturers to become acquainted with 
the properties of such devices, utilize them in prototy pe 
designs, buy a few, and again a few more; and finally 
to purchase many more as their systems are sold. One 
must remember that a device reaches large-scale, low- 
cost manufacture only when the system for which it is 
destined is also sold in large numbers. 

To illustrate how many innovations must be accom- 
plished in translating a concept into a finished device 
and a manufactured integrated circuit, one can look 
at some of the key technical innovations 23 - 24 and de- 
velopments which led to the Minuteman II system in 
1966 (Figure 4). This system employed integrated cir- 
cuits in its guidance computer. 

Systems 

Let us now consider several computer systems 25 that 
first utilized various new semiconductor devices 
(Table II). The years 1951-1952, when the transistor 
had already been in existence for three or four years, 
saw the advent of some of the first electronic computers 
using vacuum tubes. The first commercial computers 
with germanium transistors were introduced in 1956, 
when the silicon diffusion techniques were just being 
announced by Bell Laboratories. Diffused silicon 
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TABLE I — Dates of announcement of devices 
and circuits 

DATE OF ANNOUNCEMENT 
DEVICE LABORATORY FOR SALE 


Transistor discovery 1948 

Germanium transistor 1951 1952 

Grown silicon transistor 1954 

Diffused silicon transistor 1956 1957 

PNPN stepping switch 1956 

Planar silicon transistor 1958 1959 

Integrated circuits 1958 1961 

MOS registers (100 bit) 1966 1968 

Bipolar memory array 1968 1969 

(64 bit) 


TABLE II — Computer active devices and dates 
of first system shipment 

YEAR FIRST 

PRODUCED SYSTEM DEVICE TYPE 

1951 Univac I Tube 

1953 IBM 701 Tube 

1956 Univac 1101 Germanium 

transistor 

1962 Telstar I Silicon transistor 

1963 Minuteman I Silicon transistor 

1965 IBM 360 SLT hybrid 

(silicon) 

1966 Minuteman II Integrated circuit 

1966 Univac, RCA, etc. Integrated circuit 

1968 Various MSI scratch pad 

memory 

1969 Calculators MOS-LSI 

regsters 

1969 IBM LSI buffer 

(CACHE) 

1970 Various LSI memory 


transistors did not find their way into computers until 
about 1963 with Minuteman I, and 1965 with the IBM 
SLT hybrids in the commercial Model 360. Monolithic 
integrated circuits did not appear until 1966-1967 
in military systems (Minuteman II) and commercial 
computers (RCA, Honeywell, UNIVAC, Burroughs). 
The first large .computers that will incorporate LSI 
are still on the drawing boards, and are expected to 
emerge in the early 1970's. 
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Figure 4 — From transistor to Minuteman II, a twenty- 
year sequence of innovations in solid-state devices 

The relationship between component and 
system innovations 

One of the reasons why systems do not immediately 
adopt a revolutionary concept is that the concept must 
have not only promise for the future, it must also com- 
pete in cost or performance with existing technologies 
in practical applications. Consequently, except for 
military applications that value lightweight or other 
factors of engineering performance more than cost, 
the germanium transistor was used in commercial 
computers only after it provided both higher speed 
and a lower cost than vacuum tubes. 

The same principle holds for each later development 
In fact, integrated circuits exceeded the frequency, 
performance and cost less than most discrete silicon 
or germanium transistors only after 1965, and thus were 
not applied to commercial computers until about that 
time (Figure 5). 

The same applies to LSI; most types described or 
available today are not yet out of the laboratory or 
are only in pilot production. 22 ” 29 These just about match 
the costs of more conventional MSI or low-cost inte- 
grated circuits. Vigorous competition is not yet ap- 
parent, though it is anticipated. 

Figure 6 traces the path of a system's components 
to some of its subsystems and systems, relating the 
previous data on the development dates of components 
and systems. For additional perspective, we have 
traced a few initial pertinent developments in materials 
and basic research. 
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Figure 5 — Switching rate per dollar for computer logic 
elements 



YEAR OF FIRST INTRODUCTION 


Figure 6 — Tracing the development of new components 
into systems 


Interpretation 

One can examine 6 ' 8 ' 11 - 12 what must be accomplished 
in order to turn an assemblage of integrated circuits 
into a useful series of computer subsystems, whether 
logic, memory, or other. For example : 


. Improvements in LSI Manufacturing 

smaller devices with finer mechanical and optical 
accuracies 

greater processing yields and lower costs 

new package developments 

multiple layer metallization and interconnections 

. Improvements in Design 

computer-aided logic and circuit design and 
tolerancing 

automatic mask generation 

test sequence and operation by computer 

• Improvements in Applications and Development 
diagnostic routines and their auto mature 
simulation 

development of more appropriate architecture 
and hierarchies for systems and memories 
improved methods for reliability assessment 

It is apparent that much of the implementation in 
development of such LSI circuits borrows heavily from 
the computer field itself in terms of mechanizing tihe 
performance of engineering design, development, test 
and diagnosis at many levels of device circuit, and 
subsystem engineering. This is in addition to the proc- 
ess improvements required in manufacturing the 
circuits. 

In the new medium of the silicon crystal, one can- 
not test, trouble-shoot, and correct breadboards in 
the traditional w r ay — that is, by using an oscilloscope 
or meter and test probes. The circuits are too minute, 
too buried under other connections and insulating 
layers, for po‘nt-by-point signal tracing to be effective. 
Thus, both systems and device engineers must use new 
methods of diagnosis and analysis, must develop soft- 
ware and simulation techniques in order to understand 
what is going on within their own devices. This is 
clearly an age of computers building computers. The 
needs of the LSI laboratories in the semiconductor 
industry regarding computer design, simulation, and 
test make this very clear. 

The device maker and the computer builder are 
inevitably linked to one another. In fact, the device 
maker might turn to the computer builder and say, 
“We thought you already knew how to design, test, 
and diagnose logic and memory circuits by use of 
computers. But now we find that we have to learn this 
from the beginning.” 

Even with more rapid and effective utilization of 
computers in LSI design, manufacture, test, and im- 
provement, time delays must be expected between the 
first versions of this new concept and its becoming a 
reliable low-cost product, and between this intermediate 
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step and the ultimate utilization in a large commercial 
electronic system such as a computer. Many interfaces 
must be matched, the previous, but stiL advancing, 
technologies must be overtaken, economic trade-offs 
performed, and investment decisions reached. Financial 
decisions are generally the most important, and these 
frequently require the longest to resolve in large or- 
ganizations. Confidence in the new LSI product must 
be established, its reliability examined, the credibility 
of its manufacturer and his delivery and cost promises 
examined, and finally any alternative approaches again 
compared. 

Of course this all takes time. But therein lies a dilem- 
ma. A new product will not get off the ground if some- 
one does not risk using it; its manufacture will not be 
initiated if there are not at least prospective customers, 
and establishing reliability is difficult and expensive 
without prototype system use and field testing. Con- 
sequently, it is tempting to brush away the dilemma 
by early promises and premature announcements. 
Many observers believe that without forward looking 
claims such new concepts and developments would 
only evolve at a snail's pace. “Nothing ventured, 
nothing gained" certainly applies in this case. And the 
only valid realization of the promise of LSI is the de- 
livery of such circuits and their successful use in an 
electronic system. 

In interpreting the accomplishments to date, and 
the reasons why some expectations have not been 
realized, one discovers the following: 

• The definition and structure of an LSI computer 
are not fully understood, but are still evolving. 
Yet progress toward large-scale integration ap- 
pears inevitable. The semiconductor industry has 
a tremendous commitment and momentum toward 
further integration of circuits. 

• Considerable time is required for the exchange of 
ideas and their assimilation, in order to accomplish 
the experimental interaction required to turn con- 
cepts into practical embodiments in systems and 
to test these in the field. 

• While both component and computer industries 
may be learning from previous difficulties, many 
of the interactions required now between system 
and device designers remind one of a similar mis- 
match of expectations and performance requiring 
further interactions 30 during the early days of 
simple integrated circuits. 


Further expectations for LSI 

Some might conclude that the next step inevitably 
leads to the further integration of LSI — integration 
cubed or GSI (for Grand-Scale Integration). More 
likely, however, the evolutionary process will approach 
in various ways the concepts of molecular electronics, 
in which simple as well as extremely complex electronic 
functions are delineated and designed into the molecular 
arrangements of solids, such as a small chip of a crystal 
of silicon. Furthermore, it seems that the concept 
applied over and over is that of batch fabrication, ap- 
plied to a medium particularly well suited to this con- 
cept. 

The computer-on- a-slice concept may not soon be 
here; instead the memory-on-a-slice, the arithmetic 
processor-on-a-slice, the internal communication sys- 
tem-on-a-slice, or whatever, will be. The most likely 
subsystems amenable to implementation in LSI will 
be those suitable to repetitive batch processing, and 
those requiring relatively few connections to interface 
with other parts of the system. Significant in all cases 
is the repetitiveness of internal structure and parti- 
tioning that provides great functional capability with 
relatively few external leads. When there are a million 
devices inside one LSI chip, such will be called a “mega- 
electronic" device. But this version is still far into the 
future. Less complex circuits now provide good per- 
formance at low cost, and so will continue to be used 
for some time to come. But the underlying assump- 
tions — that by shrinking device size further one will 
gain both more devices per unit area and higher in- 
ternal speed — are real and lead to the expectations of 
still further increased performance at lower cost per 
function. 

While many practical difficulties must still be over- 
come, the fundamental physical limits 3 permit at least 
another order-of -magnitude improvement over the 
performance-to-cost ratios of many present integrated 
circuits of medium and large scale. 

Near-term applications most likely for LSI circuits 
are the following . 31 » 32 

Memory buffers 

The rapid increase in speed of logic circuits has forced 
modest progress in core memory speed and cost, but 
has far outstripped improvements in the speed of access 
of disc memories. Thus, opportunities for buffers be- 
tween disc and core memories, and between core memo- 
ries and fast logic circuits, exist. The LSI (MOS and 
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bipolar) circuits are well suited to these respective ap- 
plications, and are now being tried aggressively by some 
designers. 

Small memories and logic systems 

The ease of interfacing with related integrated circuit 
logic makes semiconductor LSI memories very suitable 
and reasonably inexpensive for use in small systems. 
Pending applications are in desk calculators and in 
character generators for display! Existing medium- 
scale integrated (MSI) circuits are being used in lamp 
drivers and scratch pad 16-bit memory devices. 

Linear circuits 

Just as applications for linear integrated circuits 
lagged behind those for digital circuits, so LSI linear 
circuits are expected to develop more slowly than 
digital ones. However, a number 6f quite complex cir- 
cuits for TV and stereo radio have been developed by 
now, all of which certainly may be classed as medium- 
scale integration. Sophisticated operational amplifiers 
and active filters are also worthy of consideration. 

Other applications 

Another widely used circuit of the future is likely 
to be a serial or parallel address encoder/decoder, which 
can be set by means of external connections or preset 
by the manufacturer. This class of circuit will be utilized 
in remote signaling and TV tuning, intercoms, mobile 
communication sets, and automobile or other command 
multiplexing systems. It also resembles certain address 
encoders/decoders used in computer circuits. While 
most of the cited applications have not yet been devel- 
oped widely, they will require circuits ranging from 
four to 32 bits, which would barely be considered in 
the LSI class. Further applications are in digital dif- 
ferential analyses and other specialized calculator or 
function generator circuits. 

CONCLUSION 

This paper has looked at some of the promises made by 
device developers about LSI andl examined their ac- 
complishments so far. The inescapable conclusion is 
that only medium-scale integration is here today. It 
will be another year before large-scale integration will 
be available, reliably manufactured, and accepted for 
use in critical portions of electronic Computers. 

It i§ also apparent from this paper that, in order to 
be applied in useful computer systems, technical in- 
novations must undergo further adaptation to the 


specific systems, and vice versa. This mutual improve- 
ment and development requires human interaction 
and communication 33 during months or years of time. 
Of course, one can only predict the orderly progression 
of technology and its gestation with time, and progres- 
sion and gestation may be speeded by new developments 
or delayed by unfortunate experiences. 

One can certainly expect the future evolution of 
large-scale integrated circuits and their increased 
participation in electronic systems — not only iu com- 
puters, memories, and peripherals, but also in tele- 
phone and industrial systems; and in automobile, 
appliance and entertainment consumer products. 
Only the time scale is unknown. These visions of LSI 
are on the horizon — to predict when they will draw 
within arm’s reach is not the purpose of this paper. But 
once the first application has been successfully intro- 
duced, many more will follow rapidly. 
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What has happened to LSI — A supplier’s view 

by C. G. THORNTON 

Philco-Ford Corporation 
Blue Bell, Pennsylvania 


INTRODUCTION 

Three years ago at the Solid-State Circuits Conference 
in Philadelphia, the concept of large-scale integration 
was already considered to be sufficiently far advanced 
as to be the main theme of the Conference, with a 
large number of technical papers showing beautiful 
colored slides of potential “products,” containing 
several hundred transistors interconnected with two 
layers of metallization on a single chip of silicon. Re- 
lated papers were presented at the Fall Joint Computer 
Conference that year, and semiconductor vendors had, 
for some time, been indicating the benefits that would 
accrue to the straightforward extension of the princi- 
ples of planar integrated circuits to more complex 
“subsystems” on a single piece of silicon. The concept 
appeared to be clear — all that remained was its imple- 
mentation; yet, as of the start of 1969, no major 
systems had been constructed with LSI and predictions 
of significant volume usage were still one to two years 
away. One can legitimately ask whether the darling 
of the industry a few short years ago has become the 
“bete noir” of today’s computer industry, or whether 
most of the problems have been solved and we are 
well on the way to practical commercial utiliza- 
tion? This paper reviews some of the more significant 
problems that have required solution during the past 
four years, in order for LSI to now begin to play its 
role as a major element in new system design. 

The situation can best be discussed in terms of the 
specific problem areas that have been encountered since 
1964, in attempting to implement LSI. These include: 


1. System design. 

2. Product design. 

3. Fabrication capability. 

4. Testing. 

5. Packaging. 

6. Reliability. 

It is the thesis of this paper that a number of specific 
problems existed in each of the above areas which 
would logically have been expected to require several 
years of effort in their solution. Each of these is dis- 
cussed. 

System design 

Since the functional density which can be practically 
obtained on a single MOS chip has led that obtainable 
with the bipolar approach, early LSI systems design 
experience was based on the use of MOS technology. 
Although individual MOS-LSI circuits were co m- 
mercially available four years ago, sales for such de- 
vices to be used in conjunction with conventional 
components were very limited. It was quickly realized 
that it was nearly as difficult to build a cost effective 
computer system which partially used MOS-LSI, as 
it is for a person to become partially pregnant. For 
example, compatibility problems arose when systems 
were redesigned to use MOS rather than bipolar shift 
registers. Mixed systems were designed, only to find 
that by the time the cost of the interface circuitry and 
the clock drivers were included, it was more economical 
to use a larger number of smaller bipolar register 
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elements. More significantly, attempts to partition 
parts of existing systems into blocks containing 100 
gates or more led to excessive interconnections to the 
discrete IC control circuitry, and to new packages 
containing up to 60 leads. Chip sizes tended to be 
14,000 mils 2 or larger, and it became; a costly experience 
both to user and supplier to learn that such chips were 
at that time well beyond the state of the fabrication 
art. For optimal utilization of LSI, the system designer 
has found that he must rethink his system from scratch 
in terms of the new technology, he must be able to 
partition the system into tractable chip sizes with 
reasonable gate -to -pin count ratios, with considerable 
advanced care required at the partitioning step to 
insure the ability to test the resulting functions. It 
has also required studies, such as the LIMAC LSI 1 
demonstration vehicle, the design of small calculators 
and the appearance of a variety of standard LSI 
functions to assist in shaping new design concepts. 
These concepts include distributed control and memory, 
with integral chip decoding and encoding, and the use 
of read-only memory subroutines, among other tech- 
niques. Just as the active device count required to 
perform a function went up dramatically when the 
designer went from the use of discrete components to 
integrated circuits, the systems designer has had to 
learn to waste LSI circuitry efficiently in order to 
make his system design compatible with the tech- 
nology. 

Given that the entire system must be redesigned 
and the associated expense, it is not surprising that 
most initial LSI equipments have been limited in 
scope. To attack the broader problem of designing 
large LSI computer systems or major product lines of 
peripheral systems, only a few user companies out of 
the entire industrjr initially madej the total commit- 
ment required (i.e., 20 to 50 engineers with available 
in-house or vendor prototype device fabrication facili- 
ties). Such programs typically started three to four 
years ago using MOS technology, and have just this 
year reached a level of completion where prior system 
commitments can be made. 

Product design 

Progress toward LSI may also have been impeded 
by the proffered viewpoint that : the semiconductor 
vendor would supply the necessary partitioning and 
design capability. The semiconductor vendor suggested 
that he would integrate his facility upward to encom- 
pass subsystem design in much the same fashion as 
he had previously taken over much of the computer 
circuit design. On the contrary, many of the more 


successful total system programs today seem to be 
those where the vendor is supplying design rules relating 
to his fabrication capability, and the custom chip 
designs are being accomplished directly within the 
systems houses. In 1969, requirements have already 
existed for over 500 specialized custom chip designs 
needed by approximately a dozen users to implement 
prototype systems. The number of engineers required 
to accomplish these designs, even with a modern 
computer-aided design capability, far exceeds the 
number available in vendor companies. It would be 
irrational, moreover, to expect semiconductor device 
manufacturers with their general purpose circuit engi- 
neers to compete with major equipment houses in 
optimizing the partitioning and chip design in a variety 
of special system applications. Failure on the part of 
many system groups to get sufficiently involved in 
the design of custom LSI has slowed the rate of usage. 

The main thrust of the component vendors has 
been to increase the breadth and complexity of their 
“standard product” lines, since it is only through 
volume production of such standard products that the 
ultimate lowest costs per chip will be obtained. For 
certain classes of circuits, the standard product ap- 
proach is moving rapidly, with the development of 
such devices as shift registers, read-only memories, 
random access memories, A-to-D converters, D-to-A 
converters, BDA’s, parallel-to-serial and serial-to- 
parallel converters, counters, etc., being made availa- 
ble. 

Regardless of who designs the LSI components, the 
tools were simply not available to do the job until 
recently. As a minimum, the following are required: 

Logic simulation techniques 

Techniques are required for simulating the perfor- 
mance of the blocks obtained by a trial system par- 
titioning. Such simulation should include not only logic 
simulation, but should ideally take into account circuit 
delays. Some LSI systems designers have not been 
content to rely on computer simulation, but have 
constructed simulation cells, or macro versions of the 
subcircuits that they plan to work with, so that they 
can physically simulate the performance of the entire 
LSI chip. Such simulation techniques have been in 
development in a number of laboratories for several 
years, and several computer programs have also now 
been developed to attack this problem. 

Standardized design approaches 

During the past three years, the cost of obtaining a 
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few custom LSI chips from a vendor has remained 
remarkably constant in the range of $25,000 to $50,000, 
with several months to supply prototypes. Vendors 
and users have both attempted to improve the situ- 
ation by using computer aids, and in some cases by 
using a standard cell or building block approach. A 
typical design approach is shown in Figure 1. The 
individual steps may all be performed manually or 
they can be accomplished by a computer operation. 
The numbers in the corners of the blocks give a rough 
indication of the priorities in terms of the development 
of computer techniques to replace manual methods. 

It is noted that, after simulation and testing, higher 
priority is given to automatic mask generation than 
to the more complex problem of placement and routing. 
This stems from the need to eliminate the time con- 
suming and error prone operation of ruby cutting as 
well as the need to obtain the required precision without 
excessively large camera reduction. Most large MQS- 
LSI chips to date have been accomplished with manual 
placement and routing, with computer placement and 
routing just becoming an effective tool. 

The “standard cell” may vary in complexity all the 
way from a complete gate or flip-flop configuration 
to cells as small as individual transistor or line seg- 



Figure 1 — LSI product design 


ments. The larger cells are easier to use in computer- 
aided design, and computer-aided placement and 
routing programs are more successful with this ap- 
proach. Although the technique does not achieve 
minimum area, it has permitted major reduction in 
prototype design and turn-around time. The near 
practical development of all of these techniques has 
taken three to four years to accomplish, with more 
improvement to come. 

Common design rules 

Another major obstacle has resulted from the fact 
that multiple sourcing of user design circuits requires 
a certain degree of unanimity among suppliers' design 
rules and processes. After four years of MOS process 
evolution, it is only this year that parts can be ordered 
from as many as three suppliers, using nearly the 
same set of masks. The situation in bipolar has been 
equally chaotic, with no effective second source capa- 
bility. More than one major system has gotten into 
serious trouble with a single source of LSI-MSI that 
failed to materialize. Other users are going to be very 
reluctant to move ahead with LSI, until some types 
of multiple sourcing can be found. 

Fabrication capability 

The ease with which photographs of large complex 
chips with multilayer metallization could be obtained 
for publication a few years ago has proved to be grossly 
misleading in terms of the magnitude of the technical 
problems. As a matter of fact, a number of fundamental 
technical problems initially existed which made it 
economically impossible to produce LSI devices. Three 
of the more significant of these are discussed here. 

These are: 

1. Defect density. 

2. Multilayer metallization. 

3. Mask making. 

Defect density 

Chief among problems discussed was that of defect 
density, but the tendency was to greatly oversimplify 
the expected solution to the problem. Many managers 
felt that the defect density would be reduced largely 
by “greater care in processing,” or “use of clean room 
facilities,” rather than requiring the development and, 
in some cases, the invention of totally new fabrication 
techniques to successfully produce these devices. 

In 1964, the defect problem was treated analytically 
by Murphy, 2 who showed that with the existing defect 





372 Fall Joint Computer Conference, 1969 


densities of several hundred/cm, 2 j economically practi- 
cal arrays could be expected to contain about 10 gates 
per chip on the order of 30 to 60 mils 2 in size. Further 
studies have shown that even with appreciable cluster- 
ing of defects, a 98 percent yield of single gates is 
required to obtain a reasonable yield at the 100 gate/ 
circuit level. 3 One approach to finessing the problem 
was through the discretionary wiring approach. Un- 
fortunately, this technique developed its own set of 
problems, which took twice as long to solve as originally 
estimated. The problem of eliminating the defects was 
also greatly underestimated by single chip LSI sup- 
pliers, and large chip yield forecasts were made which 
could not be met. The sources df defects were subtle 
in nature, and their solution has Required chemical and 
metallurgical process changes in wafer preparation, 
photoengraving, metallization, ajnd mask making. It 
has only been during the past- year that, under labora- 
tory conditions and with several new process innova- 
tions, the required low defect densities (less than 
10/cm 2 ) have been attained to permit fabrication of 
bipolar arrays containing hundreds of components on 
large chips. A good yield, circa 1969 (greater than 20 
percent), is illustrated in the Wafer map shown in 
Figure 2 for 256-bit shift registers, each containing 
2067 transistors on 100X 100 mil chips. 

The defect problem was thought to be simpler with 
MOS, in view of the smaller number of processing 
steps. MOS arrays did, in fact, initially yield better 
in somewhat larger chip sizes than bipolar, with con- 
siderably higher yields on a peri-component basis be- 
cause the active devices require less area than bipolar 
devices. However, the MOS limit was soon reached 



Figure 2 — Map of wafer of 256-bit shift registers 


at less than twice the chip size of bipolar, as it was 
found that each MOS process step was more critical 
than its bipolar counterpart. Specific MOS problems 
relate to the surface-sensitive nature of the devices, 
to the high fields which exist, and to the susceptibility 
of the thin gate oxide to contain specific types of defects. 

Many 1964-65 MOS circuits were fabricated with 
only 1000 k of oxide in the gate region. The thin oxide 
was required in order to overcome the high level of 
fixed charge density, Q ss , in the oxide, and obtain 
tractable levels of threshold voltage. Clock voltages 
in the range of 25 to 30 V were used to overcome the 
high threshold voltage characteristics of these devices, 
and obtain reasonable speeds. Thus, fields as high as 
3X10 6 V/cm were impressed across the oxide, with 
even higher yields at any thin spots that might be 
process induced. If one examines the detailed topology 
of MOS integrated circuits, one also finds stepped 
regions in the oxide and metal edges where even higher 
field concentrations exist, where defect-free devices 
break down when overstressed. The maximum oxide 
breakdown field for near perfect planar metal-Si0 2 - 
silicon structures has been determined in this and 
other laboratories 4 to be approximately 10 7 V/cm, not 
allowing for thin spots in the oxide. Thus, these devices 
were extremely marginal in design. It remained for 
the industry to learn how to reduce and control the 
oxide charge, permitting thicker gate oxides to be 
used with greater safety margins. 

In addition to the problem of leakage through the 
oxide, MOS device performance and stability depends 
on the control of a number of interface effects at the 
dielectric semiconductor interface, most of which have, 
during the past four years, become well understood by 
physicists working in research and development labo- 
ratories, but whose control at the production level is 
only now becoming a reality. An example of the type 
of problem is that of field inversion where MOS devices 
lose their inherent isolation properties when the inter- 
face state density or field charge in the field oxide are 
allowed to vary. Specifically, in 1964, there were only 
a few effects associated with planar oxides that were 
of much concern to integrated circuit manufacturers. 
These included surface recombination which affected 
transistor p and diode leakage, and the presence or 
absence of surface contaminants on the oxide which 
were believed responsible for the occasional channelling 
problems on life tests. The high doping density and low 
voltages used in most bipolar circuitry made these 
devices relatively resistant to surface problems. With 
the advent of MOS devices, a number of additional 
effects became important, and new discoveries were 
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Figure 3 — Distribution of charges in a MOS structure 


made which have now been determined to affect the 
yield of both MOS devices and the smaller geometry 
bipolar devices desired for LSI. These include the 
presence of fast and slow ions diffusion in the oxide, 
the presence of fixed charges in the oxide whose magni- 
tude is a function of processing conditions and applied 
fields, and a number of different kinds of minority 
carrier trapping effects in the oxide and at the inter- 
face. The complexity of the problem is seen in Figure 3, 
which shows the location of charges in a planar oxide 
structure. 

High yield production of LSI devices requires special 
tests at each manufacturing step to control the im- 
portant oxide charge effects. Control charts in new 
areas must be maintained, and the effects of process 
variability on these effects must be well understood 
by production engineering. Whereas such control has 
been readily understood and applied in the R&D line 
and at the pilot line level, many companies have been 
slow to implement these procedures in production, 
due to the considerable re-education process that is 
required. More than one company has been severely 
disappointed in their attempts to place LSI in pro- 
duction. 

Another important limitation in increasing the 
yield and reliability of LSI devices has been the fact 
that these very complex structures literally defy 
analysis of internal yield and reliability problems as a 
function of the terminal parameters of the finished 
device. Traditionally, single transistors had been in- 
corporated on each chip as an aid in process control, 
and for determining causes of low yield. 

As the detailed nature of the many sources of device 


problems has evolved, it has been necessary to devise 
special test structures, each used to examine a par- 
ticular effect in the absence of other effects. Two 
structures which are used for this purpose are illustrated 
in Figures 4(a) and 4(b). The structures shown test 
for the following individual effects : 

1 . Transistor properties and field inversion, 

2. Mobile and fixed charge in the oxide, 

3. F ast and slow interface states, 

4. Surface ion migration and surface conductivity, 

5. Leakage between p regions and leakage in large 
and small periphery p-n junctions under a 
variety of oxide thicknesses and metal over- 
layers, 

6. Shorts and leakage through different thick- 
nesses of oxides over different suface conductiv- 
ity types and with varying topologies (small and 
large oxide steps), 

7. Metal and p-region resistance and electro- 
migration susceptibility under various localized 
conditions, 

8. Metal continuity over steps, 



Figure 4 — Test vehicles 
a. Surface effects test vehicle 
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b. Oxide integrity and metallization test vehicle 


9. Contact resistance, 

10. Resistance of multilayer vias, 

1 1 . Leakage through multilayer dielectric, 

12. First - and second- level metal resistance. 

Such test vehicles must be used in the laboratory, 
pilot production, and production operations to control 
the process and optimize the yield, yet they took two 
years to develop and apply after the basic effects 
were known. 

Multilayer metallization 

Most of the early LSI demonstration photographs 
showed multilayer metallization. In many cases, it is 
possible to obtain a 2:1 reduction in chip area with the 
application of an additional layer of interconnections. 
In the case of the discretionary wiring approach, its 
use was absolutely essential. Initially, it had been 
expected that the major problem in the use of two- 
layer metal would be due to shorts through the di- 
electric. This did turn out to be a very significant 
problem in the case of discretionary wiring, where an 
entire wafer is covered with second - and third-level 
insulated interconnections which must he free of shorts. 

In the case where the actual shorts through the 


oxide are not present, there may still be a large number 
of thin or weak spots which are susceptible to pre- 
mature breakdown. The evolution of a uniform high- 
strength dielectric for multilayer technology involved 
tests such as those shown in Figures 5(a) and 5(b). 
In this type of test, thin metal is used in the upper 
layer so that when a short develops, the energy dissi- 
pated will evaporate the metal away from holes — 
thereby “ clearing” the short, and restoring the original 
condition. Thus, it is possible to impress consecutively 
higher and higher voltages between the two layers, 
exposing the weak spots one-by-one, until the ultimate 
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Figure 5 — Breakdown strength of oxides in a 
multilayer test vehicle 
a-1. Dielectric strength test vehicle 



a-2. Enlargement showing self-healed pinhole 
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VOLTAGE (VOLTS) 

b. Pinhole density for silane- vapor-plated and 
R-F sputtered Si0 2 on delineated aluminum 
5,000 A and 10,000 A thick 

dielectric strength is determined. The “ stair -step” 
plots, shown in Figure 5(b), show the wide differences 
between silicon oxide dielectric layers prepared with 
differing processing conditions. It is now possible to 
prepare both chemical vapor -deposited and sputtered 
Si0 2 layers which are virtually free of shorting-type 
defects within the area of a single LSI chip, and success 
is also being reported on large discretionary wired 
wafers with a combination of these techniques. 

In the case of smaller chips, opens proved to be of 
more significance than shorts, with problems developing 
at the vias between upper and lower metallization 
levels. In order to limit their size, such vias must be 
kept small in area, and it was quickly determined 
that the presence of thin oxide layers or other con- 
taminants at these points would produce either opens 
or an unacceptable amount of via resistance. Under 
non-ideal conditions, a test structure such as included 
in Figure 4(b), containing 18 vias in series, commonly 
shows resistances on the order of 10 to 20 ohms. In 
some LSI circuits, the tendency for a high resistance 
to be present is increased by the occurrence of cell 
potentials, which produce an anodizing effect during 
via etching, and which is a function of a particular 
circuit topology. Thus, the same four vias in a circuit 


containing 120 vias might be found to be open without 
any obvious reason. Metallization problems also de- 
veloped with electrical opens occurring at the point 
where the upper-level metal steps down over an abrupt 
oxide cut to reach the first-level metal, and the metal 
at these.points tends to become constricted. It appears 
to have taken the better part of two years of effort in 
various industry laboratories to develop multilayer 
processes to the point where they can be used to 
achieve competitive yield and reliability levels with 
single-layer metal products. Even so, rules governing 
the via area znd shape of the via cut must be carefully 
chosen and strictly adhered to. 

The application of multilayer metallization to MOS 
is less critical for via resistance, since the circuit 
operate at high impedance levels. A different type of 
fundamental problem arose, however, when it was 
found that the application of the second layer of di- 
electric caused drastic changes in the electronicjproper- 
ties of the first-level silicon-oxide interface. 

Not only temperature and radiation effects (in the 
case of sputtering processes) exist, but rapid diffusion 
impurities can be introduced which penetrate to the 
original interface and alter the charge condition. Thus, 
the same level of new understanding and special process 
control is required as was the case in the original de- 
velopment of stable high performance MOS devices. 

Mask making 

In 1964 and 1965, severe problems existed in mask 
making which alone would have made it impossible 
to manufacture LSI. Problems existed in both image 
quality and image registration. 

In the case of image quality, lenses were not general- 
ly available to handle the conventional 10X final step- 
and-repeat reduction with a sharp field in an area 
greater than a 75x75 mil chip. Attempts to step at 
a larger size and then reduce a multiple pattern were 
also limited by the lens quality and photoproce 3 sing 
techniques, so that considerable size and corner com- 
pensation had to be built into the original artwork to 
obtain something close to a usable mask. 

As better lenses became available, image quality 
improved, but problems remained in sizing and regis- 
tration which still limit maximum practical array size. 
A high yield of circuits of typical “state-of-the-art” 
design generally requires the placement of successive 
images, one within the other, with a separation of a 
tenth of a mil, and a tolerance of this of 0.05 mil. In 
an LSI device, one might logically wish to obtain 
such registration at opposite ends of the diagonal of 
a 115 mils 2 chip. In the mask-making stepping process 
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alone, three sources of error occur (under optimum 
conditions) which affect this registration: (1) vertical 
stepping error ±0.01 mil, (2) rotational stepping error 
dbO.Ol mil, (3) size reduction error ±0.02 mil (1 part 
in 4000 reduction error over a 2" stepping table travel). 
Adding these tolerances leads to ±0.040-mil regis- 
tration error in the mask, which means that the pro- 
cessing operator must align her mask during device 
fabrication to ±0.01 -mil — a bare possibility. Thus, 
any attempt to fabricate circuits at sizes larger than 
115 mils on a (side with 0.010 mil registration require- 
ments has automatically placed severe limitations on 
the expected yield, and practical bipolar LSI design 
rules have therefore been kept to larger tolerances or 
smaller chip sizes. Unfortunately, optimum MOS per- 
formance demands even lighter design rule tolerance 
( ± 0.08 mil on gate overlap) . 

Testing 

LSI raised many new problems in testing, some of 
which were initially recognized and some which only 
became evident when manufacturers attempted to 
move LSI testing to the production level. It is now 
generally recognized that for circuits containing more 
than 50 gates, one cannot practically exercise all of 
the logic contained on the chip as a method of testing, 
since the time required to accomplish this quickly 
stretches into many hours or days per circuit; rather, 
test programs must be computer-generated which rely 
on the fact that only certain kinds of faults can practi- 
cally exist in the device, and which merge redundant 
test patterns. Some fault conditions can only be de- 
tected by the introduction of specially constructed 
error inputs. Even with such factors taken into ac- 
count, however, an effective test sequence can only be 
expected to become available when the test problem 
has been taken into account at the time of system 
partitioning and circuit design. In some cases, it is 
necessary to break feedback lines on a chip to reduce 
sequential networks to combinational networks, albeit 
at a sacrifice of gate-to-pin ratio. 

At best, a formidable problem still presents itself. 
Two of the major contributors to this problem are: 
first, the inability to test the “inner stages” of the 
array, resulting in an inordinate number of tests neces- 
sary at the inputs to guarantee the proper outputs, 
and second, the complex test sequence generally exceeds 
the capability of available test equipment and might 
be expected to add a disproportionate amount to the 
total cost of the device. As the level of integration 
increases, the number of actual chips per system will 
decrease, but the cost of testing fewer (but more com- 


plex) chips can become the most significant contributor 
to the final cost of the unit. 

The testing of sequential logic can be considerably 
complicated by the necessity of first applying a se- 
quence of input patterns to force the output into a 
particular state. Consequently, consideration must be 
given to the sequence of the input patterns to ensure a 
complete functional test. As with combination net- 
works, a test pattern for sequential networks can be 
reduced through the use of computer-aided test mini- 
mization programs. However, these programs can be 
quite long, hence, expensive, since many distinguishing 
sequences are necessary to check the possible failure 
modes. 

Although the problem of generating sufficient test 
programs has in many cases been satisfactorily resolved, 
the problem of how the testing is to be performed on a 
manufacturing basis is still largely undecided. As of 
the New York IEEE Show in March 1969, for example, 
only two or three pieces of commercial equipment 
were being offered for LSI testing, and in general 
these equipments are either considerably limited in 
capability or are very expensive, as applied to single 
operator handling. Examination of these equipments 
and other individual test equipments, which exist in 
individual companies, would suggest that we are still 
in the first generation of LSI test equipment develop- 
ment. Progress in the commercial use of LSI will 
continue to be impeded until this problem is resolved. 

Packaging 

LSI raised many new problems associated with 
packaging these devices. Early attempts at LSI system 
partitioning led to poor gate-to-pin ratios in an at- 
tempt to maintain maximum system flexibility which 
in turn required large numbers of bonds. Initially, 
attempts at packaging such LSI were extensions of 
the then available flat pack techniques attempting to 
maintain a minimal periphery chip with a large number 
of closely spaced leads. This configuration led to a 
shorter seal length than had been determined by the 
package industry to be required for freedom from 
leakers. The urgent necessity for having packages 
suitable for prototypes also led to the use of less than 
optimum procedures for fabricating and sealing these 
packages. Sealing techniques which worked well on 
small integrated circuit packages failed to seal properly 
when the package periphery became large, and special 
techniques had to be developed. Conventional leak 
test procedures cannot be applied, since the larger flat 
packages will not withstand the same test pressures 
and the larger internal volume requires excessively 
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long pressure tests to detect small leaks. The larger 
number of pins also put new requirements on the wafer 
and chip bonding processes. Large chips are more 
likely to have voids in the chip-to-header bond, and 
a larger number of wire bonds have to be made without 
a bad bond in order to obtain a finished device at high 
yield. One solution to this problem has appeared to 
be in the direction of beam lead or flip-chip techniques. 
The applicability of such techniques to large numbers 
of interconnections has relatively recently been demon- 
strated, as in the case of the semiconductor memories 
described by Kraynak, 6 Agusta 6 and Alexander. 7 Most 
of these approaches have required additional process- 
ing steps on the wafer to obtain the required bonding 
materials at each interconnection site. 

High speed LSI arrays have also placed new demands 
on packaging from a power dissipation standpoint. For 
example, an array of 100 high speed gates, each dissi- 
pating 50 mW, would produce a total dissipation of 
5 watts, which has been beyond the state-of-the-art 
of conventional IC packaging. High speed LSI has 
therefore required considerable research into methods 
of obtaining high speed at lower power levels, and this 
has required smaller geometry structures to minimize 
capacitance — thereby making the large LSI circuits 
more difficult to produce at a reasonable yield. 

Reliability 

One of the originally stated reasons for going to 
LSI has been to increase reliability by decreasing the 
total number of interconnections and packages in the 
system. This may be true for a system of fixed capa- 
bility, such as a desk calculator or a computer terminal. 
On the other hand, in large systems, LSI is more often 
viewed as a means of economically increasing the total 
system complexity to perform more tasks, rather than 
as a means of decreasing the package count for pre- 
viously designed systems, in which case the MTBF 
for the total enlarged system is of concern. 

The advent of LSI brought into the picture a new 
range of potential reliability problems that have to be 
resolved. 

Since LSI devices are more complex, they require 
more metallization per chip. The larger number of 
pins in LSI leads to an increased number of interfaces 
between the chip and the package, and it is at these 
locations — wire bonds and chip bonds — that the princi- 
pal failure modes occur in silicon integrated circuits. In 
fact, metallization and wire bond failures account for 
approximately 60 percent of all conventional inte- 
grated circuit failures. Thus, reliability may suffer on 
a per package basis. 


In the case of high density LSI in a conventional 
type of IC package, dissipation is increased to the 
point where the circuit elements are operating con- 
siderably closer to the maximum allowable junction 
temperature than would be the case for individually 
packaged lower complexity IC's. Derating to increase 
reliability is not as feasible and it has become im- 
portant to explore the long term degradation of de- 
vices at these higher temperatures that can no longer 
be considered an accelerated condition. 

Failure rates on a per package basis are necessarily 
increased by this effect, and the MTBF for the entire 
system must be re-evaluated to make sure that the 
expected benefits are in fact being obtained. 

Perhaps the area of greatest difficulty in insuring 
LSI reliability is in the application of the screening 
techniques that have been accepted for use in inte- 
grated circuits. Typically, visual, mechanical, thermal 
and operational screening of the final product is re- 
quired. The final in-process screens should be per- 
formed at stress levels sufficiently stringent to remove 
all devices which contain potential reliability hazards, 
but the screen levels imposed must not degrade the 
inherent reliability of those devices which survive the 
screening sequence. Unfortunately, the screening levels 
adapted for conventional integrated circuits, however, 
may not be applicable in general to LSI and MSI 
devices. 

Because of the larger size of LSI packiages, the 
centrifuge and shock tests applied to conventional LSI 
can cause mechanical damage and loss of hermeticity 
unless special precautions are taken. 

Because of the increased complexity of IC‘s and 
MSI devices, it must be assumed that the effective ness 
of a preseal visual inspection will not be as great as it 
is for conventional integrated circuits. The sheer com- 
plexity of these devices outstrips the ability of a human 
operator working with a microscope. This is par- 
ticularly true when one considers the increased number 
of possibilities for scratches and open metallizations 
at oxide steps, the possibility of shorting between upper 
and lower metallization levels because of pinholes or 
cracks in the insulating oxide, the possibility for opens 
due to marginal metallization alignments, and the 
possibility of failure because of high leakage between 
adjacent metallization stripes because of photolith de- 
fects, resulting in poor delineation. 

SUMMARY 

The promises of LSI are still basically valid; however, 
the electronics industry has had to face tremendous 
problems in its efforts to make LSI a production 
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reality. The solution to these problems has required 
the development of new approaches in almost every 
aspect of integrated circuit technology, and has re- 
quired close cooperation between the vendor and the 
user. It is, in fact, remarkable how much progress has 
been made in the past four to five years. At present, 
there are over 200 catalog part numbers for LSI de- 
vices and several LSI systems are programmed for 
some 1970 production. It now appears that 1970 will 
be the year of reality for LSI. 
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INTRODUCTION 

The Graphic Display Monitoring System (GDM) is an 
experimental monitoring facility for Multics, a general 
purpose time-sharing system implemented at Project 
MAC cooperatively with General Electric and the Bell 
Telephone Laboratories . 2 ’ 7 GDM allows design, systems 
programming, and operating staff to graphically view 
the dynamically changing properties of the time- 
sharing system. It was designed and implemented by 
the author to provide a medium for experimentation 
with the real-time observation of time-sharing system 
behavior. GDM has proven to be very useful both as a 
measuring instrument and a debugging tool and as 
such finds very general use. 

Monitoring the activity of a traditional computer 
system (one with only a single active process) is a fairly 
simple task. Hardware and software devices can easily 
be devised to keep track of almost any parameter. 
Asking the question “What are you doing right now?” 
to a computer system controlling multiple processes or 
servicing multiple interactive users, however, proves 
particularly difficult to answer meaningfully. It be- 
comes necessary to “snapshot” the system (record in 
some manner its state at a specific time) and interpret 
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this information for the inquirer. Since a basic property 
of a time-sharing system is that, in fact, it is “doing 
something else” a few milliseconds from now, what the 
inquirer really wants to ask is “What are you doing 
now, and now, and now . . .?” Implicitly, he is also 
asking to be shown what is happening in an easily 
interpretable format. The GDM solution to his problem 
is to provide the user with a real-time, graphical out- 
put “eavesdropper.” 

Statistical studies of time-sharing systems have been 
performed 1 ’ 6,11 in an attempt to provide “after-the- 
fact” monitoring (in effect answering the question 
“On the average, what is happening?”) and there have 
been simulations in an effort to provide “predictive 
monitoring .” 6,11 One company has even produced a 
hardware device to receive system status information 
over a special wired in channel and record the results 
on magnetic tape . 12 Other than the “SNUPER Com- 
puter ” 6 which, however, still requires engineer- 
installed hardware probes, there has been little work 
directed towards providing a generalized, real-time, 
time-sharing system monitoring device. It is felt that 
while the hardware used for this implementation of 
GDM is perhaps unusual, the design principles involved 
and the monitoring methods explored are sufficiently 
general to provide a framework and a guide for other 
designers. 

The basic goal in designing the GDM System was 
to produce a time-sharing system monitoring device 
for use by the staff of the Multics project. Initial 
requirements implied that it would be on-line, that is, 
active while Multics was in operation — not just col- 
lecting data for future analysis, and would provide 
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dynamically changing graphic output (as well as hard 
copy if desired). It was to be designed such that the 
act of monitoring did not cause significant interference 
to the time-sharing system or j perturbations in its 
behavior and such that it would not be necessary to 
make more than a few minor additions to supervisory 
procedures in order to incorporate the GDM System 
(as opposed to monitoring done by inserting entire 
procedures in critical points in tie supervisor in order 
to collect data; see Scherr 11 for an example). Since the 
GDM System was to be an experimental tool, it was 
also considered especially important that it be easily 
expandable and adaptable to new or different monitoring 
requests. Coupled with these requirements was the need 
to involve the expected user community as early as 
was possible in the project in order to insure its con- 
tinued use after initial implementation. In this regard, 
acceptance by the systems programming staff was very 
encouraging and many currently make use of the GDM 
facility. 

The original GDM System embodies these goals 
while making use of existing hardware at Project MAC. 
The Digital Equipment Corporation 338 (see Figure 2) 
was already on site for use in othler experimental work. 
A more extensive (and less expensive) monitoring 
system could perhaps be designed if it were possible to 
choose both the display processor and the method of 
interface to the time-shared computer. This was not, 
however, viewed as a major handicap in developing a 
useful system. 

Succeeding sections will discuss the various com- 
ponents of the GDM System and will describe in detail 
initial experiments and current usage at Project MAC. 
Compromises in design and special problems due to 
the particular constraints of the display hardware or 
software and the Multics system to which they inter- 
faced are also discussed . 

What is the GDM system f 

Subsystems 

The GDM System consists of four major components: 

A. An input-output procedure running under 
Multics to transmit data as requested to the 
display computer. 

B. A monitor system operating on the display 
computer to facilitate the creation, storage, and 
retrieval of display templates (see below) and 
to perform various other housekeeping functions. 

C. A series of display computer subroutines for 
manipulating data and generating command 
sequences for the display. 


D. A language for describing desired data manipu- 
lation and display formats (Display Description 
Language), a (planned) compiler for translating 
such descriptions into display computer assembly 
language programs, and a set of macro-defini- 
tions for simplifying display computer program- 
ming and for calling the subroutines mentioned 
under C. 

Figure 1 gives a functional representation of the 
various GDM subsystems showing the interaction 
among them, the two computers, and the user. Figure 
2 shows the complete hardware configuration. Refer- 
ence 8 goes into considerable detail about the GDM 
monitor system software including system flow charts. 

Modes of operation 

Use of the GDM System generally falls into one of 
three classes of operation: 

1. Demonstration mode: any of a number of 
library displays may be viewed to get a general 
picture of Multics operation at the moment. 
Data used in these displays is updated periodi- 
cally according to preprogrammed instructions. 

2. XRAY mode (so named because of its similarity 
to the X-ray System 4 ): the user may type the 



Figure 1 — GDM subsystem interactions 
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I I 

(THIN LINES REPRESENT DATA TRANSMISSION; HEAVY LINES 
REPRESENT TRANSMISSION OF STATUS INFORMATION AND 
INTERRUPTS ) 

Figure 2 — Hardware configuration 


segment number and offset of a datum (see 
Reference 3 for a description of the addressing 
scheme used in Multics) on the teletype of the 
display computer and see displayed the octal 
and ASCII character representation of its 
contents, updated every second (Figure 3, 
XRAY display). 

3. Display creation mode: the user will go through 
the process of creating his own display (as out- 
lined in Figure 1) in order to gain desired flexi- 
bility in data displayed, format of display, or 
data sampling rate. Displays are then saved in a 
special format, the “display template," for use 
in later experiments or as part of the library. 

All mod,es of operation employ the same type of 
display template and are listed only to differentiate 
between the application of the GDM System. System 
programmers have been trained in five minutes to 
utilize the many displays already in the library (oper- 
ation under “Demonstration Mode" ). Some use the 
XRAY display when there are one or two locations of 
interest at a particular moment, as in the current 



Figure 3 — XRAY display 

number of available disk pages or the value of a par- 
ticular time-dependent variable. Display creation 
mode, the most general use of the GDM System, re- 
quires the most work on the part of the user. He must 
decide what data items to display, how to display them, 
and how often to sample them. He must then create 
the data manipulation routines and the display list 
comprising his particular “display template." Until 
the DDL compiler is constructed, this work must be 
done in an extended version of the PDP-8 Assembly 
Language as seen in Table I (the 338 computer uses 
the same systems software as its sister PDP-8). It is 
in this mode of use that all the facilities of the GDM 
System come into play and in which the most fruitful 
experimental work can be performed. 

Examples 

Figures 4 and 5 show typical examples of GDM 
displays. Figure 4, Core Memory Summary Display, 
displays real-time information on the usage of Multics 
core memory pages; Figure 5, Active Process State 
Display, displays user activity information (see below). 
The display templates for both figures were constructed 
in about two hours apiece by an experienced user and 
have provided many hours of system observation for 
experienced and inexperienced alike. 

The display in Figure 5 causes information about 
each process in Multics to be extracted from the traffic 
controller data base. The column labelled “MP" is 
the “multiprogramming state," an indication of a pro- 
cess* eligibility to receive CPU time. Stars to the right 
of this column indicate the processes that are eligible 
(state 4). The column “ST" is the “activity state* * ■ 
running, ready to run (waiting to be serviced), or not 
ready to run. The star is next to the process currently 
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Figure 4 — Multics core memory summary display 



running, state 1. In a multi-processor configuration, 
there would be more than one such process. 

The associated bar graphs also provide a descriptive 
measure of overall system activity. By “eyeball inte- 
gration^ of the length of the bars, one can get a fairly 
accurate idea of system loading. Several means of 
calculating graph lengths have been used (in different 
display templates all using the same basic form) : 

1. Whenever a process is ready or running, the 
length of the bar is increased. When the process 
is not ready, the associated bar decreases. Each 
bar changes length as an exponentially weighted 
sum of ready -running and not-ready time. 
(This is seen in Figure 5.) 

2. Whenever a process is ready, its bar grows in 
equal time increments. When the process is 
finally serviced (receives processor time), its bar 
is reset to zero length. 


The display of type 1 gives a general picture of 
system loading but also shows something of the be- 
havior of the individual process. The scale is calibrated 
in percentage to indicate that the bar shows the per- 
centage of time a process is requesting or receiving CPU 
time — a measure of the process’ activity. The type 2 
display is more useful in getting an uncluttered, picture 
of just how long a “ready” user must wait to run, i.e., 
how long each process is spending in the queues waiting 
for service. 

The display templates for these two displays differ 
in about ten instructions (the computation of bar 
length). The two hours of editing and assembling to get 
a “first draft” of the display is even less if averaged 
over the two displays. Herein lies a basic flexibility of 
the GDM System: once the data to be displayed have 
been decided upon, it need be only a matter of minutes 
before it is viewed. Display formats can be easily 
experimented with and a finished display template can 
be added to the GDM library for future monitoring 
without any costly “dedicated system” monitoring 
runs. 

The examples discussed above show simply two ideas. 
Others have included collecting (and displaying) data 
on the mean lifetime of a page in the Multics memory 
(how long does it take before the page is swapped out 
to secondary storage), the distribution over time of 
the number of active time-sharing users (very nicely 
displayed as a graph similar to Figure 6D), and the 
average number of users referencing particular super- 
visor segments (built up during the length of the moni- 
toring session) . There is a great deal of work yet to be 
done before we run out of ideas or into the limitations 
of the GDM system. 


More on the display template 
A display template (DT) consists of three sections: 

1 . A list of the time-sharing system data items to 
be sampled (segment number and data base 
format are sufficient since absolute core lo- 
cations are determined by GDM at monitoring 
time) . 

2. Instructions on display type (numerical, ASCII, 
bar graph, other graphics, etc.), sampling rate, 
and data manipulations (averaging, sea, ling, etc.) 
for each data item or group of items. 

3. A display list: machine instructions for the 338 
Display giving text, formatting information, and 
storage for items to be displayed. 

For example, to display a single process’ activity as 
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TITLE 



12 3 45 10 15 20 


DATUM UNITS 

Figure 6 — Other standard display types 

in Figure 5 , a DT would contain about twenty in- 
structions (Table I) . 

(The various non-PDP-8 instructions (call, do, 
dlstart, etc.) are macro calls to a set of definitions de- 
signed as part of the GDM System. Various sub- 
routines (nplot, ge645, sked, etc.) are also provided as 
interfaces to the GDM monitor and to simplify program- 
ming. These features allow the programmer without 
PDP-8 experience to design a display template with a 
minimal apprenticeship. (Implementation of a DDL 
compiler should simplify this even further.) Of course, 


since all the facilities of the computer are available, 
data manipulations can be quite complex (although 
subroutines are provided for such common operations 
as scaling and masking) and displays quite unusual 
(Figure 6 shows standard types for which GDM pro- 
vides some macro facility). The only limit is the de- 
signer’s imagination and the size of the PDP-8 core 
memory. 

The Multics/GDM interface 

The GDM System is designed for use in a symbiotic 
relationship with a time-shared computer. The com- 
puter must be capable of supporting a display processor 
functioning basically independently of the time-sharing 
system but occasionally interjecting requests for data 
transmission. 

The Multics environment is particularly friendly to 
this type of system as it is possible to make data re- 
quests through the generalized input-output controller 
(GIOC) of the GE-645 9 , without interrupting the 
central processing unit (Figure 2). It is necessary, how- 
ever, to dedicate two of the 2048 GIOC channel pairs 
(one for transmitting and one for receiving) to the 
display processor. Those problems introduced by this 
relationship are discussed further below. 

The Multics/GDM interface procedures are capable 
of providing the following services : 

1. Accept address request by segment number and 
offset of data to be displayed (GDM) . 

2. Convert this address to an absolute memory 
location for interpretation by the GIOC (GDM 
to Multics) . 

3. Transmit the datum from the GE-645 memory 
to the 338 (Multics) . 

In general, a GDM -type monitor requires only the 
simplest method possible of getting data from the time- 
shared computer to the display processor. On the 
Project MAC system, this means sending requests to a 
short I/O program running on the GE-645 GIOC. The 
2400 bit per second Dataphone (201B modems) used 
for this transmission limits the request rate to approxi- 
mately twenty per second (a negligible disturbance on 
a one-and-one-half microsecond per instruction pro- 
cessor). Higher data rate transmission can be used 
with corresponding increases in interference (if we in- 
crease the rate to 40,000 bps, the perturbance is still 
less than .1 percent) and special telephone lines. 

All displays currently in use sample the GE-645 at 
rates at or near the available maximum. Displays w r ith 
a number of data items occasionally resort to special 
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TABLE I — A display template to monitor a 



T-S user’s activity 

♦address table 

tc data 

/segment name 

540; 541 

/locations within the segment 

*data_routines 

a 

/name table of routines to be 
/called by the GDM monitor 

7777 

/end of table 

a, 0 

/PDP-8 subroutine format 

call ge645, 1, 2 

/get first data item 

call nplot, mp, 

1 /plot “MP?’ state number 

call ge645, 1, 3 

/get next data item 

call nplot, st ,1 

/plot “STM state number 

jms calc 

/call to machine language 
/subroutine to calculate bar 
/graph length 

do hplot, bar 

/plot horizontal bar graph 

call sked, 144, i 

a /reschedule “a” to be called 
/by monitor in one second 

jmp i a 

/PDP-8 subroutine format 

♦display-list 

dlstart 

/macro instruction to start dis- 
/play 

nl; nl 

/“new line” for formatting 

mp, 0 

/storage fqr “MP” 

sp2 

/spaces for display formatting 

st, 0 

/storage fqr “ST” 

sp2 

/formatting 

hbar bar 

/ macro to create bar graph display 

escape 

/display instruction macro 

top 

/display instruction macro to 
/ cause refreshing of display 


sampling methods in order to update important items 
at least once a second: about the rate at which the 
human eye can follow a dynamic display with that much 
information. 

Advantages and disadvantages of GDM 

Advantages, disadvantages, capabilities, and limi- 
tations of GDM can be grouped into two categories: 
those relating to its monitoring ability; and those 
relating to its ability to report the information moni- 
tored. 

Monitoring ability 

Several factors determine the usefulness of any type 
of monitor. These include the: number and type of 


events it can monitor, the rate at which it can monitor 
them, and the interference that this observation will 
cause to the system being monitored. 

One of the capabilities of GDM is a facility to change 
the point of observation easily: this is accomplished 
through the use of the display template. A new display 
template can be designed and operational in a short 
time and, once constructed, can be added to a library 
for future recall. No hardware changes need be made, 
no plug boards rewired, no probes changed to monitor 
a new or different event. Another display template 
with a few basic instructions is all that is needed to 
change the “probe” of GDM. 

GDM, as constructed, is a sampling monitor. Current 
dataphone connections limit requests for data items to 
about twenty per second as mentioned above?. Faster 
dataphone, direct connections or other means can be 
used to influence sampling rate. The current rate is 
such that “microsecond” events cannot be monitored. 
Transient data items will be missed if their core lo- 
cation changes many times in a second. Current dis- 
plays, therefore, limit themselves to observing only 
“wired” data, this is, data whose core location need be 
determined only once during a particular monitoring 
period although the data itself may change many times. 
As approximately 80 percent of the Multics supervisors, 
data bases fall into this category at the current time, 
this is not particularly restrictive. 

Monitoring which requires the collection of a large 
number of statistics over a very short time period 
similarly is hindered by the current configuration 
although “long-time” statistics are collected and dis- 
played by a number of display templates. 

Under Multics, short-time event monitoring is per- 
formed by special software embedded in the Multics 
supervisor. 10 A GDM display is used to observe, in 
real time, the data base of this monitor in order to see 
the time build up of the statistics and to note any 
abnormalities that might be missed by observing 
averages after an hour or more of operation. In this 
way, the advantages of a real-time display are combined 
with monitoring embedded in the time-sharing system 
(which causes significant interference when turned on) 
to provide a very useful tool. 

The area of system interference has already been 
discussed but one item should be emphasized. In the 
Multics configuration, GDM need take only GIOC 
time — not CPU time. In computer systems where this 
is not possible, interference will still be negligible if 
the GDM monitor “steals” only enough information 
to make a useful display. Five hundred cycles per 
second is still only .1 percent on a two-microsecond 
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cycle time computer and this is more than sufficient 
for even the most complex display. 

Reporting ability 

Output of information is another area in which 
flexibility is crucial. Displays in Figures 3, 4, and 5 
show only numbers, characters, and bar graphs. Dis- 
plays have also been constructed with the types of 
graphs shown in Figure 6 and many others have been 
suggested for particular applications. It has been found 
that displaying the same information in different ways 
often presents an entirely different picture of what is 
going on. The only price to be paid for this flexibility 
is programmer time and even then it is no more difficult 
to display a bar graph (or any other type) than it is 
to simply show a number. Several display templates 
showing the same data in different formats can be 
made almost as easily as a single one and the best 
added to the GDM library. 

For those who desire hard copy, GDM, in its current 
configuration, offers only photographs of its displays 
(stopped at any instant of time, saved on tape for 
future reference or photographing). Plotters of various 
kinds could perhaps be connected in tandem with a 
dynamic display and requested to plot a particular 
instance, even while the CRT display is still changing. 
Here again, the designer is limited only by the hard- 
ware available and his imagination. 

CONCLUSIONS AND OBSERVATIONS 

The GDM System at Project MAC has served in two 
major capacities: 

1 . Asa monitoring “control center” . 

2. As a debugging tool. 

The very nature of a multiple-access computer 
system makes it very difficult to determine at one 
location exactly what is happening at all terminals. 
The GDM display, conveniently located near the main 
body of Multics programmers, is readily consulted to 
determine the state of a rampant user program, the 
availability of secondary storage space, or just the 
general health of the system (a slave display might 
possibly be installed near the computer itself or in the 
office of the system administrator as well). Many 
system programmers have, at one time or another, 
brought up the GDM System on their own initiative 
to find out various, otherwise unobtainable, pieces of 
information (a “cookbook” instruction sheet has been 
provided for just this purpose). A visit to the GDM 


display is always included as part of the standard 
system tour for visitors. 

As a debugging aid, GDM’ has been invaluable. It 
is responsible for the detection of many system bugs — 
often transient or time dependent — that were not 
easily isolatable by previously available means. 

One of the features of GDM that has made it so 
useful is its ability to simplify the act of dynamic 
display creation to the point where this is no more 
difficult than writing a simple assembly language 
program. This flexibility has paid many times over for 
the effort of implementation. 

Finally, GDM can be readily adapted for use with 
other time-sharing systems: only two Multics-depen- 
dent modules exist in the monitor and display templates 
can be designed to suit any system. 

GDM was designed as an experimental system and 
as such has been very useful at Project MAC. Its use 
during a period of intense debugging of the Multics 
system has proven its development worthwhile. 
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INTRODUCTION 

This paper describes research which involves the use 
of interactive computer graphics for processing systems 
analysis networks. The term “sy stems analysis network” 
is used to include project scheduling, task-resource 
simulation, computer programming flow diagrams, 
decision tree, assembly line balancing, flows in networks, 
etc. These network pictures usually characterize the 
precedence relations and the logical and data flow 
among network component parts, and are traditionally 
the planning tools for industrial engineers, operations 
research analysis, and management and systems 
planners. In this research, a system is developed to 
provide a “drawing board,” through the use of inter- 
active computer graphics, to compose, transform, 
decompose, partition, simplify, merge, and regenerate 
network pictures for the purpose of facilitating rapid 
convergence in man-computer experiments. 

First, a study of the characteristics of network 
pictures, in the light of graph theory, is conducted. It 
provides a theoretical framework within which inter- 
active graphics operations can be structured. Next, 
a system of representing and processing network 
pictures through boolean matrix operations is developed. 
This is followed by the development of algorithms with 
which to regenerate network pictures, such a picture 


would be isomorphic with its original drawing, while, 
at the same time, maximizing its visual effectiveness. 
Finally, a system which enables the user to perform 
various manipulation and transformation schemes is 
described. 

This research is in connection with the Biotechnology 
Laboratory of the Department of Industrial Engineering 
at Stanford University. An ADAGE computer system 
(AGT/30) with an on-line graphics terminal is being 
used under the sponsorship of N.I.H. project NLM 
00525-2 and School of Engineering, Stanford University. 

Characteristics of network pictures for systems analysis 

In this study we shall limit our attention to only the 
following types of network: 

Activity network (e.g., PERT, CPM) 

Project scheduling 
Job-resource simulation 

Flows in networks (e.g., maximal flow, shortest route) 
Decision tree 

Computer program flow diagrams 
Assembly-line balancing 

For convenience, henceforth they will be grossly called 
“systems analysis networks,” or “SA networks.” The 
logical structure of these networks gives rise to some 
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common characteristics in their graphic representation. 
We shall describe some of them below: 

Independence of geometric constraints 

By independence of geometric constraints we mean 
that an SA network picture does not require rigid 
coordinate positions for its picture parts as is in the 
case of drawing of a physical object. An SA network is 
essentially a directed line graph 3 with only precedence 
relations to be considered. In fact* it can be constructed 
with only nodes and arcs. A nodd*generally represents 
an event, a machine, an operation etc.; and an arc may 
represent an activity, a flow, and at the same time, 
gives a sense of precedence. 

The order in which operations or decisions are 
performed in an SA network is expressed by precedence 
relations. A precedence relationship exists betweer nodes 
belonging to the same path. We say node x precedes 
node y to imply that y cannot occur until x has occurred. 
This relation may be expressed by precedence operators 
with symbols >, <. The expression x > y implies x 
precedes y, or equivalently, y < x (y is preceded by x). 
The precedence relationship is transitive, i.e., if x > y 
and y > z, then x > z. All nodes in the same network 
which can relate to each other in this manner belong to a 
partial ordered field which we shall call a “transitivity 
closure.” An immediate precedence relationship between 
two nodes is represented by or . x — > y implies x 
immediately precedes y; x y implies x is immediately 
preceded by y. The — >, operators (link operators) do 
not have transitive properties. 

Therefore, an SA network picture can be defined the 
same as a directed graph which we shall denote by G. 
G = (X, F) where F is a “precedence function” defined 
over X. X is the set of all nodes in G. F(x) is the set of 
all immediate successors of node x in G. The expression 
y e F (x), or simply y c Fx, implies that node x and node y 
(both belong to X) are connected by a directed arc 
(an arrow) pointing from x toward y. We denote this 
arc by (x, y) where x is called the first node, and y the 
second node, x and y are called “adjacent nodes.” 
(Henceforth, an individual node will be identified by a 
lower case English alphabet, with or without subscript). 

The letter A denotes the set of all arcs in G. The 
expressions G = (X, F) and G = (X, A) are equivalent. 

F is not necessarily a single valued function, for 
example, we may have F(x) = {u, v, w}, i.e., there are 
three arcs emanating from node x: (x, u), (x, v), and 
(x, w). F- 1 is an inverse function (the set of all immediate 
predecessors) where F-^y) = {x|y e F(x)}. Thus, if 
(u, y) and (x, y) are the arcs with y as their second 
nodes, then F“ x (y) = {u, x}. 


The functions F 2 , F 3 , . . ., F n are defined by: F 2 x = 
F(Fx), F 3 x = F(F 2 x), . . ., F n x = F(F n-1 x). Likewise, 
F~ 2 y = {x|y e F 2 x}, F~ 3 y = {x|y e F 3 x}, . . F~ n = 
{x|y e F n x}. F n x is called the “nth generation successor 
set of x,” F~ n x, the “nth generation predecessor set of x.” 

To preserve the consistency of transitivity relation- 
ship (so that y cannot be both a successor and a 
predecessor to x) we shall regard all SA networks as 
being acyclic (i.e., there is no directed cycle in G). The 
cyclic conditions may be treated as acyclic with the use 
of “equivalent nodes” as will be discussed later. 

^ - 71 

We call Fx the “successor set” of x. Fx == ^ F l x 

i=l 

where F h x 7 * <t> and F* +1 x = 0. We call Fx the “pred- 
ecessor set” of x. ¥x = w F“* where F~ k x <f> and 

i=l 

F~( fc+ i)x = <£. We define the “forward transitivity 
closure” of x by {x} Fx, the “inverse transitivity 
closure” of x by {x} w Fx^ and the “transitivity 
closure” of x, F(x), by {x} w F(x) w F(x). 

Subgraphs, partial graphs, partitions, and 
reduced graphs 

A subgraph of G = (X, F) is a graph (Z, F f ) where 
Z ( X, and for all nodes x in Z, F^x = (Fx) ^ Z. i.e., 
a subgraph of G is the result of taking away at least one 
node, and its associated arcs, from G. A partial graph 
of G is a graph of the form (X, F') where F'x ( Fx for 
all x in X, i.e., a partial graph of G has all the nodes 
of G but without some (at least one) of its arcs. 

Xi, X 2 , . . ., X r constitute a partition of X if: (1) 

r 

w Xi = X; (2) for every i and j, i ^ j, and i, j < r, 

i=l 

Xi ^ Xy = <t>. A graph G° = (X°, A 0 ), where X° = 
{Xi, X 2 , ...,X r ), and A° is the set of arcs, is called a 
reduced graph of G. (Xi, Xy) e A°, i ^ j, if and only 
if there exist a node x e Xi and a node y e Xy such that 
(x, y) € A. 

Common basic diagrams 

Basic diagrams are subgraphs which possess certain 
topological characteristics into which an SA graph can 
be decomposed. We consider all SA graphs as aggregates 
of some basic diagrams. It is advantageous that these 
basic diagrams be prestored in a dictionary, therefore, 
it is not necessary to enumerate the topological details 
of a basic diagram each time it occurs in a graph. 
A graph may be collapsed into a simpler form by 
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reducing the number of nodes and arcs in some, or all, of 
the basic diagrams contained in the graph (graph 
reduction is explained in the last section of this paper). 
We introduce some of the most commonly used basic 
diagrams below: 

Closed Serial Path (see Fig. l.a): Many SA network^ 
are constructed with individual paths (e.g., job-resource 
simulation). Serial path is a simple and elementary path 
R(x, y) having n nodes and exactly n — 1 arcs, x and y 
are exterior points of R(x, y). Nodes in R(x, y) which 
are successors of x and predecessors of y are interior 
points of R(x, y). A closed serial path has the properties 
of: (1) Fx and F _1 y are singletons; (2) For each interior 
points z, Fz and F -1 z are singletons. A serial path which 
violates the aforementioned properties is an open serial 
path. The simplest form of a closed serial path is two 
nodes linked by an arc. In such a case, there is no 
interior point. 

Simple Out-Branch (see Fig. l.b): Branches in SA 
networks often indicate decision points. An out-branch 

{a) CLOSED SERIAL PATH 



(b) SIMPLE OUT-BRANCH 



<d) CLOSED PARALLEL PATH 



Figure 1 — Basic diagrams 


occurs at a node x if |Fx| > 1. A simple out-branch 
requires that there is no path between members of Y 
where Y is a subset of F(x). In our example (Fig. l.b), 
Y = {a, b, e}. 

Simple In-Branch (see Fig. l.c): In-branch occurs at 
a node x if [F -1 [ > 1. To be a simple in-branch there 
must not exist a path between members of W where W 
is a subset of F~ x (x). In our example, W = {a, b, c}. 

Closed Parallel Path (see Fig. l.d): A closed parallel 
path PP(x, y) implies that there are more than one 
closed serial path from x to y with x, y as their exterior 
points. 

The concept of weighted arcs and nodes 

In an SA network an arc serves two functions: (1) To 
connect two nodes and give a sense of precedence, and 
(2) to carry values. For examples: cost, capacity, 
distance, flow units, data string, time, speed, prob- 
abilities, are values which may be associated with an 
arc. We consider those arcs which serve both functions 
weighted arcs. Similarly, a node may be weighted. For 
example, in a job-resource simulation a node is typically 
a processing station which contains channel capacity, 
mean processing rate, probability distribution function, 
queue storage, etc. 

Picture composition and storage 

Picture composition 

To begin an interactive experiment a user must be 
able to draw a diagram on the CRT similar to the kind 
of diagrams he usually draws on paper. An SA network 
picture may be drawn by either manual input from the 
ADAGE graphics terminal, using joystick and light pen, 
or by programmed statements, or a combination of both. 
During the drawing phase of the experiment the console 
input involves the permanent display of certain 
“f unction keys” and “graphic primitives” on the bottom 
and the right edge of the CRT screen. A graphic 
primitive (one of the node symbols) is picked up and 
moved to the desired position on the screen with the 
movement of the tracking cross which is directed by the 
console joystick. Directed arcs are created by connecting 
nodes with the movement of the joystick. Alpha- 
numerical labels for each node may be entered via the 
console typewriter. Figure 2. a shows the free-hand 
drawing of a project scheduling 3 network picture on 
the CRT. 

Equivalently, a picture may be composed by pro- 
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Figure 2a — Initial drawing of a project scheduling 
network 
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The Precedence Operators (see Fig. 4 for examples) 

x > y x precedes y 

x < y x is preceded by y 
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Figure 2b — The connection matrix — -C-matrix — of 
a project scheduling network 
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gramming. We introduce some of the commonly used 
operators, together with some examples below (with 
the contention that a network picture progresses from 
left to right) : 


The Link Operators (see Fig. 3 for examples): 


Figure 3— Examples of link operators 
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x > [a, b, c] 


© 

© 

® 

x -*■ y > [a, b, c] 


— ►© 

© 

© © 


© 

Figure 4 — Examples of precedence operators 

Retention of picture information with Boolean 
matrix operations 

The special topological characteristics of the SA 
networks (i.e., a picture is defined by the precedence and 
logical relationships among network components rather 
than their geometric attributes) permits us to make a 
radical departure from the conventional means of 
picture storage in which the coordinates and other 
geometric specifications (e.g., radius, angles) of vectors 
or primitives are to be remembered. The picture 
retention scheme for the SA networks involves a 
minimum amount of information, yet it preserves the 
isomorphism of the picture topology as well as the 
“meaning” of the picture. Under this scheme, a picture 
may be regenerated for the purpose of CRT display or 
for the purpose of revision, decomposition, reduction 
or merging with other pictures. This is done through 
the Network Picture Processing Language (NPPL) 
which employs boolean matrix operations for various 
picture manipulations. 

During the picture composing phase, while a picture 
is being drawn on the CRT by the user, a “connection 
matrix,” or “C-matrix” is constructed in the working 
storage of the computer. C-matrix is a boolean matrix 1 
with dimension n x n where n is the total number of 


nodes in G. If we label the row corresponding to xy by i, 
and the column corresponding to xy by j, then the 
element of C has the value cyy = 1 if (xy, xy) e A; 

n 

c yy == 0 if (xy, Xy) 4 A. Xy is a source node if 23 Cki = 0 

n n 

and 23 > 0. xy is a sink node if 23 > 1 and 

k= 1 Jfc=l 

n 7i n 

23 Cy& “ 0. xy is an isolated point if 23 — 23 — 0* 

1 k=l k~l 

For example, Figure 2.b shows the C-matrix for the 
project scheduling network sketch given in Figure 2 . a. 

In order to preserve computer memory the C-matrix 
is converted into a “precedence matrix,” or “P-matrix,” 
before storage. A precedence matrix is a connection 
matrix with its row (columns) arranged in accordance 
with the precedence relations in G. The rules of arrange- 
ment are as follows : 

A 

1 . If y € Fx, then y must be placed after x (i.e., the 
row (column) associated with y must have a 
larger xjtdex number than that of x). 

2 . If y e Fx, then y must be placed before x. 

3 . If y 4 Fx, then the order between x and y is 
irrelevant. 

Figure 5 shows*the P-matrix of the project scheduling 
network of Figure 2 .a. We notice that the P-matrix is 
triangular (this will always be true if the precedence 
relation are held), and it is predominantly inhabited 
with zeros. Both features contributed to the economical 
use of core storage. 

Picture regeneration 

Convention of network picture arrangement 

The network pictures stored in computer file may be 
retrieved in its entirety, or in part, for CRT display. It 
is also desirable to redisplay a picture immediately 
after it is drawn becaus^ invariably the computer will 
generate a “better” picture than the one drawn by the 
user. In our present effort the convention of a network 
picture generated by the computer includes the 
following rules ; 

1. All the source nodes are placed at the left end 
of the screen which means that the network 
pictures progress from left to right. 

2. Only forward arrows are allowed, i.e., no back- 
ward or vertical arrows. 

3 . All arcs are made of linear segments. 

4 . Line crossings are to be minimized. 

5 . Other visual effectiveness considerations. 
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Figure 5— The precedence matrix— P-matrix — of a 
project scheduling network 


As mentioned before, the special structure of the SA 
networks allows us to generate a network picture with 
an efficient generator which presupposes the topological 
characteristics of the graph, thus, reducing drastically 
the storage requirement. The graph generator of the 
NPPL operates on the P-matrix (or C-matrix) and 
transforms it into a graph image with all routes of 
interconnections “optimized.” For example, Figure 6 is 
the same network picture as shown in Figure 2.a but it is 
interconnections “optimized.” For example, Fig. 6 is 
the same network picture as shown in Fig. 2.a, but it is 
drawn with the convention and constraints set by 
NPPL. 



Algorithms for optimum routing for 
interconnections 

We consider the graph area as being a rectangle grid. 
The nodes of a graph are always placed at the inter- 
section of the vertical and horizontal lines. We call the 
vertical lines “stages” and the horizontal lines “levels.” 
If we can place each node of the graph at its proper stage 
and level, and connect them according to F function, 
then a graph is generated. Figure 7 shows the project 
scheduling network with each “stage” indicated. If the 
“stage” and “level” assignments are not properly made 
it may result into backward arrows and frequent 
occurrence of line crossings. Both features are undesir- 
able from a visual effectiveness point of view. We shall 
briefly list the procedures of assigning stages as inflows 
by using the example of the project scheduling network: 

1. Place all source nodes in stage 0, S(0). e.g., 
S(0) = {s}. 

2. Obtain S'(l) = w F(x). e.g., S'(l) = {e}. 

x e S(0) 

&. Obtain S' (2) = w F(x). e.g., S' (2) = {f}. 
x € S'(l) 

In general, S' (n) = w F(x). 

x e S'(n— 1) 

4. If S'(n) ^ S'(n + k) = Y ^ <j> (where k > 1) 
then S'(n) would be modified by labeling mem- 
bers of Y in S'(n) as “dummy nodes,” and the 
successors of dummy nodes would be deleted 
from any subsequent stages. The dummy nodes 
will be repeated at each succeeding stage until 
S'(n + k). The function of the dummy node is to 
be a “place marker” for an arc which crosses 



Figure 6 — Regenerated picture of a project 
scheduling network 


Figure 7 — -"Stages” of a network picture 
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S(0) = { s } 

S(1> ={e} 

S(2) = { F } 

S(3) = | BP, WG, SD } 

S(4) = { RP, BF, FW, H\ B, 

S(5) = { H, P\ R, G*} 

S(6) ={P, GD, G*} 

S(7) = { FI, G } 

S(8) = | K, C, FP, L 1 

S(9) = | Pt, V*, T* \ 

S(10) = { V, EF, T* > 

S(11) = { T ( 

*Dummy Node 

Figure 8— Assigning stages 

several stages (this is often necessary in order to 
avoid backward or vertical arrows) and to keep 
it free from interference from other arcs or nodes. 

5. The “scan” process will continue until stage m is 
reached where S(m) = c t>, and all S'(j)'s, j = 1, 
2, . .., m — 2, have been modified (i.e., the 
labeling of dummy nodes and the deletion of 
their successors). The modified stages are then 
denoted by S(j) for all J(=^ 0). Figure 8 shows 
the result of the scan process as it applies to the 
project scheduling network. 

The next is to assign “levels.” Figure 9.b shows a 
graph of the project scheduling network from stage 7 
through 11, using the order of node appearance in each 
stage (Figure 8) as the initial “level” assignment. 
Figure 9.a gives its associated P-matrix with stage 
partitions shown. As can be seen in Figure 9.b, that the 
“unoptimized” version of node positioning resulted into 
two line-crossings ((FP, Pt) with (C, V*), and (Pt, EF) 
with (V*, V)). The crossing violations can also be 
detected from the P-matrix as shown in Figure 9.b. 

The P-matrix of Figure 10. a shows proper level assign- 
is shown in Figure 10. b. 

ment (thus, the matrix is called P*-matrix). This is done 
by interchanging row (column) positions of K with C in 
stage 8, and Pt with V in stage 9. The optimized graph 
is shown in Fig. lO.b. 
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7 8 9 10 11 



The criteria of optimizing the rows and columns of 
the P-matrix, in order to minimize line crossings, are: 

1. Interchange rows and columns only within each 
stage. 

2. The non-zero elements of each row should be 
consecutively located. 

3. If the non-zero element of a row begins in 
column j, then no non-zero element of any 
previous row may begin in a column with 
column index less than j. 

An optimal, or near optimal condition may be 
achieved by rearranging the columns and rows of 
P-matrix belonging to the same stage such that the 
resultant matrix meets, or most nearly meets, the above 
criteria. 

The dummy nodes are not displayed on the CRT .as 
full symbols, instead they are merely treated as point 
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Figure 10 — -An example of proper “level” assignment 


vectors which often serve as pivot points for arcs. For 
example, see arc (T*, T) in Figure lO.a. 

The handling of circuits 

A graph which contains one or more circuits (i.e., 
directed cycles) is called a cyclic graph. A computer 
program flow diagram is typically a cyclic graph since 
program loops are the rule rather than the exception. 
Certain job-resource simulation models in which 
rejected product recycles back for rework at a previously 
encountered work station also constitutes a cyclic 
graph. While a cyclic condition can always be handled 
with connectors, thus making the resultant graph 
acyclic, it is more desirable to use directed arcs to 
display the actual circuits. In NPPL, the backward 
arrow, which represents the feed-back portion of the 
circuit, is treated as a forward arc during the graph 


regeneration phase. Upon completion of the line, an 
arrowhead facing the opposite direction is placed at the 
beginning of the arc (i.e., the left end of the arc). The 
presence or absence of circuits can be detected by 
examining the diagonal elements of the T-matrix. 
A non-zero diagonal element signifies the existence of a 
circuit. 

Picture manipulation 

Union of graphs G x + G 2 

Let Gi = (X x , AO and G 2 = (X*, A 2 ). Gi + G 2 is 
feasible if (1) X x ^ X 2 = 0, or (2) X x ^ X 2 0 and the 
set of common nodes (i.e., X x ^ X 2 ) have the same order 
of presence in both P-matrices P(G0 and P(G 2 ). The 
procedure of obtaining the P-matrix associated with the 
graph Gi + G 2 , provided that P(G0 and P(G 2 ) have 
been obtained, is as follows: 

If Xi ^ X 2 = 0: 

1. “Fill” Operation for P(G0 — Expand P(G0 by 
adding |A 2 | zero row vectors of dimension 
| Ai| + |A 2 | to the bottom of P(Gi), and |A*| zero 
column vectors to the right of P(Gi), i.e., after 
the last column in P(Gi). Thus, we have obtained 
the expanded P(Gi), pP(Gi). 

2. “Fill” Operation for P(G 2 ) — Expand P(G 2 ) by 
adding |A X | zero row vectors to the top, and |A X | 
zero column vectors to the left, of P(G 2 ). Thus, 
we have obtained pP (G 2 ) . 

3. Finally, P(G X + G 2 ) = pP(G x ) -f- pP(G 2 ). The 
symbol + stands for element-to-element boolean 
“inclusive-or” operations. 

If X x ^ X 2 ^ 0: 

1. “Fill” Operation for P(G X ) — To obtain pP(G x ) we 
fill P(G X ) with zero row (column) vectors of 
dimension |A X | + |A 2 | corresponding; to those 
nodes which are in G 2 but are not in G x . 

2. “Fill” Operation for P(G 2 ) — To obtain pP(G 2 ) we 
fill P(G 2 ) with zero row (column) vectors of 
dimension |A X | + |A 2 | corresponding; to those 
nodes which are in G x but not in G 2 . 

3. P(G X + G 2 ) = pP(G x ) + pP(G 2 ). 

See Figure 11 for an example of G x + G 2 . 

Intersection of graphs G 1 * G 2 

Assuming G x ^ G 2 ^ 0 we first obtain a compressed 
matrix tP(G x ) by striking out all rows and columns 
which are associated with non-common nodes. By 
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pP< G l) pP!G 2 ) pP(G t ) + pP{G 2 ) 



Figure 11 — An example of Gi + G 2 


similar method, we also obtain yP(G 2 ). As in Gi + G 2 , 
nodes in Xx ^ X 2 must have the same order in both 
P-matrices. P(Gi * G 2 ) = yP(Gi) * yP(G 2 ) where the * 
symbol indicates an element-by-element “and” opera- 
tion of two boolean matrices, i.e., A l B = C implies 
c»i = a ij A b^y. We use Gi and G 2 as illustrated in 
Figures 11. a and ll.b to show the result of Gi * G 2 in 
Figure 12. 

Deletion operations G x — G 2 

Since Gx — G 2 implies Gi — (Gi * G 2 ) we strike out, 
from P(Gi), all those rows and columns which belong to 
the set of common nodes, i.e., all the common nodes of 
Gi and G 2 , together with their associated arcs would be 
deleted from Gi. 

The — operator is also an unary operator, e.g., the 
expression — Gi produces the complement of subgraph 
Gi, or G — Gi, where Gi C G; the expression —A 
produces G — A where A ( G; — x means the deletion 
of node x, together with its associated arcSy from G; 
— (x, y) deletes the arc (x, y) from G. 

Properties of +, *, and — operators 

The operators described in the previous three sections 
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7P<G 2 > d 7P(G,) * 7P(G 2 ) d 


G 1 * G 2 


Figure 12— An example of Gi * G 2 

have the following algebraic properties: 

1. Commutative: Gi 4- G 2 = G 2 H- Gx, 

Gi * G 2 = G 2 * Gi 

2. Associative: Gi + G 2 + G 8 = Gx + 

(G 2 + G s ) = (Gi + G 8 ) + 

g 8 

(Gi * G 2 * G s = Gi * 

G 2 * G s ) = (Gi * G 2 ) * G 8 

3. Distributive: Gi * (G 2 -pG 8 ) = (Gi * G 2 ) 

+ (Gi * G 3 ) 

Gi + (G 2 * Gs) = (Gi + 
G 2 ) * (Gi + G 8 ) 

4. De Morgan’s Law: Gi — (G 2 + G 8 ) = (Gi — 

G 2 ) * (Gx - Gs) 

Gx - (Gt * Gs) = (Gx - 

G 2 ) + (Gx - Gs) 

Generation of subgraphs 

A standard operation in graph theory 1 is to compute 
C M which gives the number of paths of length n (length 
is defined as the number of arcs between two communi- 
cating nodes on a particular path) between any two 
nodes in G. As can be envisioned, this is a costly opera- 
tion particularly if the graph is large. Instead of asking 
how many paths of length n between x and y we now ask 
whether there is a path from x to y of length n. We may 
achieve this by using boolean matrix operations. Thus, 
in raising C-matrix to a power we replace all ordinary 
summation by boolean summations. If P is the product 
matrix of A x B where matrices A and B have dimensions 
m x r and r x n respectively, and x is the symbol for 
boolean matrix multiply, then p**y = a^biy + a» 2 b 2 y + 

. . ., +ai r bri (where p*y, a*y, b;y are elements of matrix 
P, A, B respectively). We denote the nth power of 
C-matrix resulting from boolean matrix multiply by 
C”. C*» is a zero-one matrix. It is associated with a 
graph which possesses an arc (x, y) if and only if there 
exists a path of length n from x to y. For example, 
Figure 13 shows the C-matrix associated with a gaph and 
the jpowers of C. C 2 * shows that there # exist paths from 
a to d, b to e, and c to e, of length 2. C 3 shows that there 
is only one path of length 3 in the entire graph, and that 
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* 

Figure 13 — Examples of C n matrices 



Figure 14 — A transitivity matrix 


k 

where X' = {x} \J (w F>x) 

ITRAN (G, x, k) Construct subgraph (X', A/) 

k 

where X' = {x} w w F~*x) 

/-i 

For example, if we name the graph in Figure 6 (a project 
scheduling network) G 0 , the statement Gi = ITRAN 
(G 0 , H) would produce a graph showing all the activities 
which are prerequisite to the installation of heating 
(H), including the node H. Figure 15). The graph which 
represents activities between foundation (F) and flooring 
(FI) may be obtained by the statement G 6 = FTRAN 
(G 0 , F) * ITRAN (G 0 , FI). (Figure 16) 

Graph reduction and expansion 

Operating on the “basic diagrams” as explained 
earlier, the NPPL can successively reduce a graph to 
various levels of complexity as may be specified by the 
user. Some of the standard reduction functions are as 
follows : 


is from a to e. C* = 0 which implies that there is no 
path of length 4 or more in the graph. 

Boolean summation may also apply to the addition of 
connection matrices. A + B = C (where A and B have 
the same dimensions) implies c = 0 if a t *y = b i; - = 0; 
c.*/ = 1 otherwise. Using the examples given in Figure 13 
we show the result of C + C^ + C 3 in Figure 14. The 
type of boolean matrix in Figure 14 which we shall call 
“transitivity matrix,” or “T-matrix,” indicates whether 
there is a path between any pair of nodes. It also gives 
F x and Fx for any x. For example, the column labels 
associated with the non-zero elements of row vector b 
constitute the set Fb (i.e., {d, e}), and the row labels 
associated with the non-zero elements of column 
vector b constitute the set l?b (i.e., {a}). 

With the utility of the T-matrix there are a number 
of standard functions under NPPL for generating 
subgraphs. We shall introduce a few below: 


FTRAN (G, x) 

ITRAN (G, x) 
TRAN (G, x) 
FTRAN (G, x, k) 


Construct subgraph (X', A') of 
G = (X, A) where X' = {x} 
w Fx 

Construct subgraph^ (X', A') 
where X' = {x} w Fx 
Construct subgraph (X', A') 
where X' = Fx 

Construct subgraph (X', A') 


RSPI(G, x, y) Reduce the interior nodes of 


ALL ACTIVITIES PREREQUISITE TO HEATING 
G 1 - ITRAN (Gq, H) 
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Figure 15 — An example of ITRAN (G,x) 
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ACTIVITIES BETWEEN FOUNDATION AND FLOORING 



Figure 16 — Activities between foundation and flooring 

closed serial path R(x, y) in G 
and the arcs between them, into 
a single node. 

RSP(G, x, y) Reduce closed serial path R(x, y) 

into a single node. 

ROB(G, x) Reduce out-branches of node x 

in G such the set Y (defined 
earlier) is a single node. 


RSP, RPPI 



RSP, RPP, RSP, ROB, RIB 



Figure 17 — Examples of graph reduction 



Figure 17 — Examples of graph reduction con’t. 


RIB(G, x) Reduce in-branches of node x 

such the set W (defined earlier) 
is a single node. 

RPPI(G, x, y) Reduce interior nodes of closed 
parallel path PP(x, y) in G into 
a single node. 

RPP(G, x, y) Reduce closed parallel path 

PP(x, y) in G into a single node. 

As an example of utilizing some of the functions 
mentioned, Figure 17 shows a series of reduction 
beginning with the full graph of the project scheduling 
network G] of Figure 6. 

Graph expansion is essentially the reverse of graph 
reduction. The nodes to be operated on must be 
compressed (macro) nodes. The expansion may be done 
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in a single phase, or in several phases, e.g., the out- 
branches are expanded first, then the in-branches, then 
the serial paths, etc. 

In conclusion, we would like to mention that the 
network picture processing language (NPPL) is not 
designed solely for the purpose of generating and 
manipulating network pictures. A greater objective is to 
provide an over-all control system through which 
man-computer experiments can be performed. We 
envision that once a picture is constructed (whether by 
initial composition, or by merging/decomposition 
operations) the user may assign (or reassign) input data 
to any node (or arc) by selecting the desired node 
(or arc) on the CRT. An input page would then appear 
on the CRT with pre-designated format to guide the 
user for inputing data. The matrix representation of 
each graph, as previously explained, would also serve as 
pointers to the storage areas of data pages. The simula- 
tion phase would then follow the input phase. During 
any phase of the experiment, controls may be returned 
to the picture composition and processing phase in order 
to maximize man-computer interaction. 
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On-line recognition of hand-generated 
symbols* 


by GEORGE M. MILLER 

University of California 
Berkeley, California 

INTRODUCTION 

W ith the growth of information processing systems 
incorporating large data bases, many situations arise 
in which the data to be entered is a human's analysis 
of a problem. Often it will be undesirable to require the 
user to learn to type, and this mode can be cumbersome 
for random two-dimensional entry on a form or drawing. 
Using an electronic tablet coupled to a display tube 
would make it convenient for the user to point to a 
correct answer or print it in a very natural way. This 
paper describes a new technique for converting these 
hand w T ritten symbols to code words which can sub- 
sequently be processed by a computer. 

It might be supposed that handwriting is not speed 
competitive with keyboard methods. Donald Devoe 1 
of Sylvania's Applied Research Laboratory has re- 
cently conducted several experiments which indicate 
the contrary. Although handprinting of capitals and 
numerals is about five times slower than a skilled typist 
copying prose, the former compares favorably with the 
rate for untrained typists (i.e., about 60 characters/ 
minute). In a task of making geometrical measurements 
on a drawing and recording this data in a table, Devoe 
found that handprinting required only about two-thirds 
of the time required using a keyboard. This difference 
was still evident with his subjects after six days of 
practice. Hence it may be anticipated for such applica- 
tion areas as computer-assisted instruction, 2 input of 
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mathematical, logical and chemical formula in canoni- 
cal forms, input and manipulation of matrices, program 
debugging, 3 specifying and designing systems by means 
of flowchart symbols, and two -dimensional game 
playing, that handprinting will not only be desirable 
to users, but also an efficient means of computer entry. 

Research in hand-printed symbol recognition has 
been evident in the technical journals for more than 
a decade. The reader will soon discover that most 
symbol recognition literature is concerned with hard 
copy or off-line input. Typically, an optical scanner is 
used to obtain a two-dimensional array of points from 
a completed hand-printed character. The major effort 
of many researchers has been the exploration of unique 
methods of preprocessing or feature extraction to re- 
duce the dimensionality of this raw data. 4 Others have 
placed greater relative emphasis on classification tech- 
niques and on the selection of features from a feature 
set or pool. 5 More recently several workers, including 
Duda and Hart, 6 have made use of context to improve 
recognition performance. 

The electronic tablets used to obtain on-line source 
data provide a nearly exact trace of the path of the 
writing instrument and the order of the composite 
strokes used to inscribe a symbol. This time-sequence 
information is a great boon to machine recognition, 
but cannot be obtained by scanning off-line source 
images. For example, many individuals make 5's 
w r hich look almost identical to their S's. However, an 
on-line recognizer wall have no difficulty in distinguish- 
ing between this pair if the former is made with two 
strokes while the latter with only one. Similarly a 
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lower case b and a numeral 6 are readily distinguished 
if their loops are inscribed in opposite directions of 
rotation. 

Although on-line recognition systems have the ad- 
vantage of low noise input data with higher information 
content, a number of challenges face the designer. He 
may desire a recognizer program which is invariant 
to size and position of the input symbol, has automatic 
means for detecting when a symbol is completed, is 
relatively insensitive to minor perturbations from ideal 
symbol shapes, has sufficient resolution to accommodate 
the wide range of symbols useddn languages and the 
professions, is easily trained to the writing style of an 
individual, and which requires a minimum amount 
of memory space and computation time. 

The author’s on-line recognizer has been imple- 
mented on hardware typical of that found in a modern 
computer graphics environment. The components are 
shown in Figure 1 and include a time-shared computer, 
a CRT display, and a Rand Tablet. 7 When the pen 
switch is closed by pressing the stylus on the tablet 
surface, the sequence of filtered pen track coordinates, 
along with control bits to indicate the end of strokes, 
are temporarily stored in a buffer. In order to permit 
immediate display of the “ink trace,” this operation is 
performed in a PDP-5 peripheral processor. The 
PDP-5 and the display control use a common core 
memory. Communication with the Berkeley Time 
Sharing System 8 (TSS) is by means of a half duplex 
link with a capacity of approximately 50K bits per 
second. The TSS schedules the user’s dictionary building 
or recognition routines and has access to the PDP-5 
memory. The user interacts with and controls these 
routines using either a teletype or by pointing to light 
buttons on the display. The recognition routine operates 
on the track coordinate data to determine when an 
input symbol has been completed and whether the 



Figure 1 — Hardware used in recognition research 


symbol closely matches any of those previously stored 
in the user’s dictionary. The output consists of an 
identification code, and data on the size and location 
of the recognized symbol. 

BACKGROUND 

Two recognition algorithms developed by other re- 
searchers will be partially described in the next few 
paragraphs. (Additional background in on-line recogni- 
tion techniques is contained in the dissertation from 
which this paper is abstracted 9 ). The purpose of in- 
cluding this material is not to make a thorough com- 
parison or evaluation, but simply to point out several 
limitations in their methods which led to a search for 
the techniques described in the body of this report. 

G. F. Groner, 10 of the Rand Corporation, has de- 
veloped an on-line recognizer which has successfully 
been applied to a larger system for creating, editing, 
and executing computer code and flow charts. Strokes 
are identified via a data-dependent sequence of tests 
determined by the system designer. The first four 
stylus directions are used to divide the strokes into 
groups. Further tests depend upon the particular sub- 
set of strokes and are chosen from the following: the 
number and/or relative position of corners, the rela- 
tive position of starting and ending points, the number 
and/or positions of relative maxima and minima in y, 
and the fifth and succeeding stylus directions. The 
recognition of multiple-stroke symbols is based on cor- 
rectly classifying the constituent strokes and their 
spatial relationships. 

The Rand recognizer has several limitations. It can- 
not conveniently be modified for individual printing 
styles. Adding or deleting symbols is complicated be- 
cause these operations frequently require changes in 
the tests used on resident symbols. The selection of 
features and the ordering of tests are based on an 
intuitive analysis of data obtained from a subset of 
potential users. There does not appear to be any con- 
venient way of optimizing this design procedure. 

M. I. Bernstein and T. G. Williams, 11 of the System 
Development Corporation, have recently described 
an on-line recognizer in which each user of the system 
may build a dictionary of the symbols he desires for 
his particular application. Strokes are divided into 
segments if they contain corners. Segments with a 
large or small aspect ratio are coded as vertical or 
horizontal lines respectively. Otherwise the segment is 
circumscribed with a minimum rectangle divided into 
the five sub-areas shown in Figure 2. The path of the 
segment is now retraced and each time a boundary 
is crossed, the number of the newly entered sub-area 
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is added to a string to form an “area-sequence signa- 
ture.” In addition to the segment signature, the dic- 
tionary entries specify the geometrical relationship 
between the component segments and strokes of sym- 
bols. The distance between the center of each successor 
segment with respect to the center of the collection of 
its predecessors is quantized as coincident (C), proxi- 
mate (P), or far (F). If the successor segment or stroke 
is proximate or far, the direction of its center with 
respect to the center of the collection of predecessors 
is quantized to one of eight sectors. 

The SDC system requires an exact match of segment 
signatures and their spatial data for recognition so that 
a user’s dictionary should contain all of the variations 
anticipated. As an example, Figure 2 shows that the 
first stroke of a numeral 4 could have three different 
area-sequence signatures. For each of these the second 
stroke could be in any of the four spatial positions 
shown. It is very unlikely that a particular user would 
produce all of the twelve possible combinations, but 
half this number is likely. Mr. Bernstein has indicated 
that on the average he requires three or four dictionary 
entries per symbol and that certain symbols require two 
or three times this number. 

Although multiplicity of dictionary entries may not 
be a serious limitation of the SDC recognizer, it seemed 
desirable to this author to find a symbol descriptor and 



Area sequence signatures 
possible for first stroke of 4 



P 2 



Spatial variations possible for second stroke 

Figure 2 — Multiplicity of dictionary entries 


recognition technique which would permit a high 
recognition rate but require fewer dictionary entries 
per symbol. Two concepts are used to obtain this goal. 
On the one hand it does not seem necessary or desirable 
to require a rigid geometrical relationship between the 
component strokes or segments of a symbol unless this 
information is needed for classification. If the numeral 
4 is the only symbol which is generated using a two- 
stroke sequence similar to L followed by I, then there is 
no need to require any particular spatial relationship 
between the strokes. It follows, however, that some 
sort of automatic procedure is needed to determine 
which spatial information in a large set of symbols is 
redundant. A second way to reduce the number of 
dictionary entries is to devise a segment signature 
scheme which lends itself to the use of bestmatch 
techniques. The idea is to compute the degree of 
simliarity of an input segment with a set of prototypes 
and choose the closest match. With this capability it 
should not be necessary to store combinations of 
moderately distorted segments, but only nominal 
shapes. 

CVS signature and Lee metric 

General description 

In both of the above-mentioned recognition methods, 
an input symbol is classified on the basis of a number 
of discrete decisions. As a general principle it seems 
preferable to retain full information at each interme- 
diate stage in the symbol recognition process. 12 Stated 
another way, it is desirable to have a smooth trans- 
formation between data spaces. A segment descriptor 
can be thought of as performing a mathematical trans- 
formation on the sequence of pen track coordinates. 
The idea of a smooth transformation is analogous to 
that of a continuous transformation in the mathe- 
matical sense. 

The argument for the principle of smooth transfor- 
mations can be made by an example. Consider as two 
segment classes the numeral I and the right angle L. As 
the lower half of an ideal I is rotated counterclockwise 
the generated symbol will pass through a transition 
region where the probability of its being in class L in- 
creases and the probability of its being in class I de- 
creases. A good segment descriptor and classification 
method should reflect this continuous change. In the 
case of handwritten symbols, it is also desirable to have 
a feature space which is invariant to symbol size and 
position. 

The author of this paper has conceived and tested 
a segment descriptor and metric which obtains the 
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goals mentioned in the previous! paragraph. This new 
method employs what will be called the contour vector 
sequence (CVS) and has some similarity to an encoding 
scheme described by Freeman. 13 In Freeman's method 
a square mesh is superimposed on the arbitrary curve 
to be encoded. Mesh nodes lying closest to the inter- 
sections between the curve and the mesh define a 
straight-line approximation to the given curve. The 
scheme is illustrated for two symbols in Figure 3. Suc- 
cessive nodes can only be one of eight, so the resulting 
encoding is a sequence of octal digits. The number of 
elements in the chain is directly proportional to the 
length of the curve. In a subsequent paper Feder and 
Freeman 14 use this encoding technique to fit a given 
curve to a similarly-shaped section of a larger curve. 
However the method is size variant and cannot be used 
for measuring the degree of similarity between two 
arbitrary segments. 

In the author's CVS encoding scheme the contour 
of a segment is subdivided into six nearly equal length 
arcs which are approximated by their associated chords. 
Each of the chords is quantized to a vector having one 
of eight possible directions. Hence the resulting signa- 
ture is a vector CVS = Si s 2 s 3 s 4 s 9 s 6 of six components, 
where each component takes slope of values between 
zero and seven. (See Figure 4.) 

The degree of dissimilarity between two segments 
is obtained by summing the absolute rotational dif- 
ference, expressed in angular units of tt/ 4 radians, be- 
tween corresponding components of the associated 
contour vector sequences. This distance measurement 
is equivalent to the Lee metric used in coding theory. 16 
Specifically, if segment A has 

CVS ,1 = aia 2 - • - a* (1) 


7 0 I 

* i i i 1 



Octal encoding of 
adjacent mesh points 



6654322 


665432100 


and segment B has 

CVS* = bib 2 • • • b 6 (2) 

then the Lee distance (will also be: called the mismatch) 
between the segments A and B is given by 

D t (A, B) = MM,., = £ | c <| (3) 

i — 1 

where 

c i = |a i - bi| (4) 

/8 — c i, for 5 < Ci < 7 

hi = i (5) 

^c*, otherwise 


clearly 

0 < Jc{| < 4 (6) 

As |ci| cannot exceed 4 angular units, the maximum Lee 
distance between two segments is 24. 

Figure 4 illustrates the contour vector sequences for 
an alpha and a delta, and calculates a Lee distance of 
ten between these two symbols. The figure also shows 
a mismatch of only two between somewhat different 
alpha segments. The latter is an example of the smooth 
transformation between the pattern space and the CVS 
feature space. Data obtained from an experimental 
recognition program has shown that similar segments 
are mapped into points in the feature space which are 
close together in terms of the Lee metric. This clustering 
of segments which look alike to humans makes it 
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Suppose above symbols are stored prototypes 
and the symbol below is input. 
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Figure 4 — CVS encoding and mismatch calculation 

possible to only store nominal segments and use the 
metric to recognize non-ideal segments on a nearest 
prototype basis. The property also can be used to advise 
a person that certain symbol pairs which he creates are 
very “close” to each other and may give trouble in 
either human or machine recognition. 

Choice of six component CVS 

Several factors contributed to the choice of six com- 
ponents in the CVS, In order to reduce storage re- 
quirements for a user's dictionary, it is desirable to use 
as few components as possible. On the other hand the 
CVS must provide sufficient resolution to distinguish 
between classes in a large set of symbols. Experiments 
were conducted with a variety of symbol shapes in 
order to obtain a compromise betwen these two goals . 

As a minimal requirement it was felt that an on-line 
recognizer should accommodate handprinting of the 
teletype symbols shown in Figure 5. If strokes con- 
taining cusps (such as the 3 and 9) are subdivided into 
less intricate segments, this set of symbols can be 
conveniently printed using the 28 segments shown in 
Figure 6. As many of these prototype segments are 
symmetric about an axis, it appears desirable to have 
an even number of components in the CVS. Figure 6 
gives visual evidence that very little shape information 


= <>?•/ 

Figure 5— Hand printed teleb/p? 

is lost if these segments are approximated with six 
components. This number of components also provides 
a minimum Lee distance four between any pair of seg- 
ments. The symbol pairs having this lowest mismatch 
are 1 — J, C — <, S — J, and U — V. 

Although a contour vector sequence having four 
components probably would be sufficient for many ap- 
plications involving a small number of symbols, six 
components are needed to distinguish between the 
symbol pair S — J of Figure 7. The figure also shows 
that only four components provide a rather poor 
straight line approximation of intricate curves such 
as the theta or the lower case e. A final factor affecting 
the choice of a six component CVS for further experi- 
mental investigations was the 24 bit word length of 
the computer, leaving six additional bits for other 
kinds of segment data. 

Computational algorithms 

The computation of the contour vector sequence 
begins with a pre-processing operation on the raw pen 
data. The Rand Tablet (see Figure 1) has a resolution 
of .01 inches and is sampled each seven milliseconds 
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Figure 6 — Segments used to print teletype symbols 


to obtain the location of the stylus. To reduce redun- 
dancy and filter out spurious noise from the tablet, the 
PDP-5 accepts a new coordinate (xy, yy) only if the 
following three conditions are satisfied : 


(x y - Xy) < Ki (7) 

(yy - y«) < Ki (8) 

[(xy - x 4 -) > K 2 ] OR [(yy - yy) > m (9) 

where (xy, yy) is the last coordinate accepted, K 2 defines 
an inner window, and Ki defines an outer window. 
Ki and K 2 are preset to ten and three respectively, 
but may be changed from the teletype using the com- 
mand SET PARAMETERS. Owing to the high 
sampling rate, a new point is stored whenever the x or 
y coordinate changes .03 inches from the previously 
accepted point. This amount of : resolution has been 
found sufficient for subsequent computations on 1/4 
inch high symbols, but K 2 may be increased for larger 
symbols. 



644446 653356 6446 6446 



Figure 7 — Four and six component CVS’s 

From the above mentioned filtering process the 
contour of a segment is represented by a sequence of 
nearly equally spaced x-y coordinates. These points are 4 
used to obtain a six-chord approximation to the seg- 
ment. The algorithm (see Figure 8) consists of dividing 
the number of 'coordinate points less one by six, and 
taking the quotient (Q) as the nominal distance be- 
tween adjacent chord points (zy, Zy+i). If the division 
produces a remainder (R) it is distributed between the 
chords. If R > 3, an extra point is added when forming 
each of the odd chords z 0 z h z 2 z 3 , and z 4 z 6 . If R is equal 
to one or four the last coordinate is removed and if 
R is equal to two or five the first is also neglected. The se 
rules are summarized in a table of Figure 8. 

The final step in computing the CVS signature is a 
quantization of the chord directions into one of eight 
sectors. The first component of the CVS is computed 
using the AY and AX associated with the chord z c Zj. In 
the example of Figure 8 tan 22.5° < | AY/AX. | < tan 
67.5°, AX < 0, and AY < 0, indicating the direction 5. 
A similar application of the Quantization Table to the 
remaining chords results in a signature of 543175 for the 
numeral 6 shown. 
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Figure 8 — Algorithm for computing CVS 
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Figure 9 —Algorithm for computing | d \ 


If the number of coordinate points is less than seven, 
the CVS is computed in a different manner. Segments 
having four to six points are assumed to be straight 
lines and only the two end points are used. The 
quantized slope of the chord bet-wen these points is 
assigned to each CVS component. For example a short 
mid bar in the letter F w r ould have a 222222 signature. 
If the number of coordinate points is in the range one 
to three, the segment is assumed to be a dot and as- 
signed a CVS of 000000. Although this signature is 
also that of a vertical bar drawn bottom to top, the 
latter is not commonly inscribed. However, the dot 
could just as well be assigned any signature having a 
large Lee distance with respect to the other segments 
employed. By treating short segments in the manner 
described, it is possible to utilize a larger K 2 (inner 
window) and thereby reduce storage requirements 
and computational time. 

The Lee distance between two segments is obtained 
by summing the absolute rotational difference |c*| 
between corresponding components (a*, b;) of the asso- 
ciated CVS's. As indicated by equation (5), |c*| cannot 
always be obtained by simply computing the absolute 


algebraic difference c* between a $ and b*. For example 
two components with quantized directions 7 and 2 have 
d = 5, but |c,;| = 3. Equation 5 also shows that 
when c; is 6 or 7, the respective JcJ's are 2 and 1. 
However the lower table in Figure 9 demonstrates that 
when 5 < c; < 7, the correct |c«| is obtained from the 
least three significant bits of the 2's complement of 
c -. This simple algorithm has been implemented with 
standard machine instructions. 

Dissecting strokes into segments 

It is well known in mathematics that continuous 
tranformations depend upon well behaved functions. 
If a function is not continuous and/ or analytic, it may 
be necessary to apply the transformation separately 
to a piecewise approximation of the function. In an 
analogous way the smooth transformation property 
of the CVS signature and Lee metric is related to the 
geometrical properties of the two dimensional entities 
on which it operates. Discontinuities in the pattern 
space are easily handled because in an on-line imple- 
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Prototypes 

Figure 10 — Improvement obtained by segmenting 


mentation the start and end of a stroke are reliably 
indicated by a micro-switch in the writing stylus. Hence 
it is possible to compute a separate CVS for each com- 
ponent stroke in a symbol. The manner in which the 
Lee metric is applied to multi-stroke symbols is dis- 
cussed in the next section . 

Sharp corners or cusps in a stroke correspond to 
points in which the derivative of a function is not de- 
fined, and can be troublesome to the CVS transforma- 
tion. A lower case z and a numeral 3 will be used as 
an example. Figure 10 shows that a slightly distorted 
3 may have less mismatch with the prototype z than 
with the prototype 3. However, the lower half of 
the figure demonstrates that the smooth trans- 
formation property can be restored if each of the 
symbols are separated at the cusp into two segments. 
Now the upper left-cup common to both symbols has 
the same prototjrpe CVS, and classification depends 
only upon the dissimilar lower parts. Experiments 


have shown that if a large number of symbols in a user 
set contain cusps, segmentation on cusps results in 
a higher percentage of correct classifications. 

Several methods were investigated for dissecting 
strokes into segments. The first technique tried de- 
pended upon detecting the relatively slow pen veloc- 
ity in the vicinity of cusps. An inverse measurement 
of pen velocity was obtained by counting the number of 
tablet coordinates rejected in the pre-processing opera- 
tion. Experimental data showed that the writing veloc- 
ity differed between users and also for different symbols 
inscribed by the same user. Relatively slow velocity 
was observed for smooth portions of strokes as well as 
at cusps. In general the results of segmentation on 
velocity measurements were found to be unreliable. 

The dissecting method used in the experimental 
recognizer locates cusps using geometrical measure- 
ments and is insensitive to pen velocity. The algorithm 
operates on the sequence of filtered coordinates which 
approximate a stroke. Cusps are isolated when the in- 
cluded angle between three successive points is less 
than a constant (normally set to 30°). Although cusps 
are reliably detected, sharp corners may or may not 
cause segmentation. Hence if a user desires to employ 
such one stroke symbols as a narrow V or N, he may 
need to include alternative descriptors in his diction- 
ary. 

j Experimental symbol recognizer 

Multiple stroke symbols 

A symbol may be composed of several strokes and 
one symbol can be a subset of others. Consequently 
an on-line recognition program must provide some 
means to detect symbol completion. One possible 
technique divides the writing surface into a grid and 
each symbol must start in a new space. This con- 
straint may be acceptable when the data to be entered 
is in tabular form, but the technique is unsuitable for 
randomly placed symbols of varying sizes. A second 
method makes use of a tree-structured dictionary. 10 ’ 11 
After a particular stroke has been classified, the dic- 
tionary is referred to for a list of permissible successor 
strokes. If the next stroke does not have an allowable 
identity and/or geometrical position, it is assumed to 
be assoicated with a new symbol. This technique has 
the advantage of reducing dictionary search time, a 
desirable feature when there are a number of entries 
for each symbol type. However, a poorly inscribed or 
positioned stroke may not correspond to an allowable 
successor and can abort the recognition process. In 
many of these abort instances the complete symbol 
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contains sufficient information for correct classifica- 
tion. 

In the author’s recognition system classification is 
obtained from the best match of the complete input 
symbolwith dictionary entries having the same number 
of strokes and segments. No attempt is made to identify 
the component segments of a multi-segment symbol. 
Hence it was necessary to devise a symbol completion 
algorithm which operates independently of the recog- 
nition process. The basic technique is to center a 
stroke or a subsymbol in a somewhat larger rectangle. 
If the next stroke does not enter this rectangle it is 
assumed to start a new symbol. This procedure auto- 
matically adjusts to varying symbol sizes. 

The precise dimensions of the enclosing rectangle 
is determined by the aspect ratio of the stroke and in 
some cases the predecessor stroke is also a factor. If 
the height-to-width ratio R of the initial stroke is 
in the range of 1/3 to 3, the stroke is bordered on 
the top and bottom by m/2 and on the sides by m/4, 
where m is the maximum of the stroke width or height. 


A-test for 

A-test for stroke 3 

stroke 2 ] ] 

451 

JSRS3 



A-B fests 

A-test for 

A-test for 

for stroke 2 

stroke 3 

stroke 4 

1 T I 1 

1 1 1 1 

l 1 

j 1 

, ITT 

'll 

1 1; 1 

! — 1 

: :M 

1 1! 1 

! 4 

! • (_?._! 

L j 3 

! J 

R>3 




If the second stroke enters this rectangle, the combined 
strokes are enclosed with the minimum rectangle plus 
the m/2 and m/4 borders, where m is now the maximum 
dimension of the symbol or subsymbol. The procedure 
is repeated until a new stroke fails to enter the rec- 
tangle. (See example at the top of Figure 11.) 

When the initial stroke is tall and narrow with R > 
3, two different enclosing rectangles are employed. 
Rectangle A is of the previously mentioned type and 
borders the top and bottom of the stroke by m/2 and 
the sides by m/4. If the 2nd stroke enters this rather 
narrow box, the two strokes are assumed to belong to 
the same symbol or subsymbol and the A test is 
repeated. Second strokes which do not enter the A- 
rectangle but have an R > 3 are tested to see if they 
enter the B -rectangle. The latter is actually a square 
of dimension 2m. When the B test has a positive result, 
the first two strokes are enclosed with an A-rectangle. 
If the 3rd stroke is completely within this rectangle, 
all three strokes are assumed to belong to the same 
symbol or subsymbol and the A-test is continued. 
However if the 3rd stroke is not enclosed by the A- 
rectangle, the first stroke is assumed to be a complete 
symbol and the 3rd stroke is treated as a possible sec- 
ond stroke for the next symbol. As shown in the mid- 
dle of Figure ll, this feature of the symbol separation 
algorithm permits l’s to be more closely spaced than 
the vertical bars of an H. When the initial stroke is 
wide and short with R < 1/3, the B-test is applied to 
the second stroke. If the result is positive, the A-test 
is applied to subsequent strokes. (See example at the 
bottom of Figure 11.) 

In addition to satisfying one of the spatial tests which 
have been mentioned, a component stroke of a symbol 
must be made within an interval of time (prescribed 
by the user) after the previous stroke. In this manner 
the program can detect the completion of an isolated 
symbol or the last symbol in a string. 

Segment spatial data 


B-test for 
stroke 2 

| 


A-test for 


A-test for stroke 4 

$trnkp 7 i i 



Figure 11— Symbol completion algorithm examples 


If two or more symbols in a user’s set are formed 
with the same sequence of segments (e.g., the H — T or 
the n — h of Figure 12), the corresponding dictionary 
entries will contain identical contour vector sequences. 
To enable distinction, the program extracts and stores 
spatial information on the relative location of the 
center of component segments with respect to the 
center of a completed symbol. As shown in Figure 12, 
this relation is encoded as up (U), down (D), right (R), 
left (L), or coincident (C). When the spatial data for 
a particular segment in a symbol is needed for recogni- 
tion, the user sets a bit in the corresponding segment 
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Figure 12 — Geometric position of segments 


the output code may correspond to a teletype character 
or a teletype character preceded by an ampersand.) 
The operation is repeated for other symbols until the 
dictionary contains the desired symbol set. At this 
stage in the procedure the dictionary entries are ini- 
tialized not to use segment spatial data. 

Next the recognizer is placed in a TROUBLESHOOT 
mode which computes the mismatch between every 
pair of symbols in the dictionary having the same num- 
ber of strokes and segments. Symbol pairs with a low 
mismatch are displayed to the user. He may make his 
dictionary more robust by prescribing spatial dif- 
ferences, changing the form of symbols, or permuting 
stroke order. Or he may choose to ignore symbol pairs 
having low mismatch until poor recognition is actually 
experienced. Unwanted symbol descriptors are removed 
from the dictionary using the command DELETE 
followed by the corresponding output codes. The 
user can also save his dictionary on standard system 
files and retrieve a dictionary from these files. 

Figure 13 shows the printing style used by the author 
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descriptor. The task of deciding which bits to set is 
made rather simple by a routine called TROUBLE- 
SHOOT which informs the user of all symbol pairs 
having a low mismatch. As indicated by an example 
set of symbols included in a lateir section, the segment 
spatial data is seldom required. Hence the recognizer 
has inherent tolerance to sloppy positioning of the 
component segments of most symbols. 

Dictionary building and testing 

The procedure for constructing a personalized dic- 
tionary is simple and fast. In this mode the user in- 
scribes a symbol and its contour vector sequences ap- 
pear on the CRT. In the experimental recongizer the 
CVS’s are in numeric form, but dictionary construction 
could be simplified further by displaying each CVS as 
six connected vectors. The same symbol is repeated 
several times to see if there are any variations. The 
user then selects a representative sample and assigns 
an output code by pointing to one or two characters 
displayed on the CRT. (In the present arrangement 
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Figure 13 — Test symbol set 
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Figure 14 — Dictionary entries 


for 85 different symbols. The set includes the ten nu- 
merals, the upper and lower case alphabetical charac- 
ters, and 23 common teletype symbols. Lower case 
letters which are normally printed the same as upper 
case were changed to a cursive style. This was neces- 
sary because the experimental recognizer does not 
make use of relative size information. 

A printout of the dictionary entries created by the 
author are shown in Figure 14. The output code se- 
lected for lower case alphabetics consists of the cor- 
responding upper case symbol preceded by an amper- 
sand. Each segment is represented by eight octal digits 
and the six on the right are used for the CVS. The 2nd 
digit from the left contains numbers 1 through 5 
to indicate the location of the geometric center of the 
segment in a rectangle enclosing the complete symbol 
(See Figure 12). This geometrical relationship is re- 
quired during recognition if the user has set the most 
significant bit of the first digit on the left. A “1” in the 
least significant bit of the 1st digit indicates that the seg- 
ment continues a stroke and this information is used 
to partition the dictionary into subsets of symbols 
having the same number of strokes and segments. 


As the middle bit is currently unused, the first digit 
can be only a 0, 1, 4, or 5. 

The TROUBLES HOOT parameter was set to list all 
symbol pairs having a, mismatch of four or less. Figure 
15 shows that the pairs C - < , L - J U-V, and V-W 
each had a Lee distance of exactly four. The author de- 
cided to accept this level of mismatch for single seg- 
ment symbols unless subsequent tests suggested a 
change of form. Figure 15 also lists ten pairs of multi- 
segment symbols having a total Lee distance of four 
or less. Seven of these pairs were made more robust 
by taking advantage of reliable differences in the rela- 
tive position of component segments. Only one spatial 
bit was set in each symbol, thus allowing sloppy po- 
sitioning of segments not needed for distinction. For 
example the second stroke of the f-T was chosen be- 
cause it was felt that the horizontal bars of these two 
symbols would always be coincident and up respec- 
tively. However the first stroke of the f might at times 
be coincident and would then provide no spatial dif- 
ference with the T. The SPATIAL DATA mode is 
called by typing SP and the number in the dictionary 
list of the desired symbol. A routine then automatically 
requests a no (N) or yes (Y) decision to set the spatial 
bit for each segment of the symbol. 

Although the manual setting of spatial bits is greatly 
facilitated by the TROUBLESHOOT mode, the 
procedure does require familiarity with the fundamental 
principles of the recognizer. The task could be accom- 
plished automatically, but would require the user to 
provide additional input samples. The F-I pair in the 
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Figure 15— TROUBLESHOOT list 
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043321 1 1 

02444444 

MIS-MATCH: 

1 



B 

03444434 

04223556 

12223566 

MIS-NATCH: 

2 


R 

03444444 

04234566 

02333333 

MIS-MATCH: 

2 


0 

01654210 

MIS-MATCH: 

0 




V 

01321310 

MIS-MATCH: 

2 




N 

0 1 0243 1 1 

MIS-MATCH: 

3 




D 

03444444 

01234566 

MIS-MA'lCH: 

0 



0 

01654210 

MIS-MATCH: 

0 




G 

01554322 

05224444 

MIS-MATCH: 

3 




Figure 17 — TEST on upper-case letters 


Figure 16 — Automatic setting of spatial bits 

upper part of Figure 16 will be used as an example. As- 
sume that additional training samples produce the 
spatial variations shown in the middle of the figure. 
The spatial bit routine would determine from all five 
samples that the third stroke provides reliable distinc- 
tion, and that two dictionary entries are required for the 
F. (X means that the spatial bit is not set.) In contrast 
to other training methods, the user is required to pro- 
vide samples only for symbol pairs having low mis- 
match. The minimal use of spatial information results 
in a recognizer which is very tolerant to inaccurate 
positioning of the component strokes of most symbols. 

A TEST mode allows the user to further evaluate 
his dictionary. He simply draws a sequence of symbols 
which are separated from each other by at least one 
quarter of the maximum symbol dimension. The 
symbol separation algorithm determines when a symbol 
has been completed, and the recognition routine guesses 
the identity of the symbol on the basis of lowest mis- 
match. Mismatch calculations are made between the 
input symbol and all dictionary entries having the 
same number of strokes and segments. If two or more 
symbols have the lowest mismatch, the first one en- 
countered in the dictionary search is chosen. Dic- 
tionary entries in which a spatial bit is set require a 
specific location for the corresponding input segment. 


AT 

01444432 

01222222 

MIS-MATCH: 

2 

&H 

03444444 

12001444 

MIS-MATCH: 

1 

AE 

01216532 

MIS-MATCH: 

0 


AQ 

04655321 

15444444 

01222222 

MIS-MATCH: 

&U 

03442120 

15444222 

MIS-MATCH: 

2 

41 

01444322 

04000000 

MIS-MATCH: 

0 

&c 

04222222 

11665322 

MIS-MATCH: 

0 

AK 

03444440 

01556133 

MIS-MATCH: 

7 

AF 

01654444 

01121222 

MIS-MATCH: 

2 

AO 

01310632 

MIS-MATCH: 

2 


X 

01344332 

01555555 

MIS-MATCH: 

3 

AJ 

01444566 

04000000 

MIS-MATCH: 

0 

AU 

03443100 

15444322 

MIS-MATCH: 

0 

AM 

03444444 

13001344 

15011444 

MIS-MATCH: 

AP 

03444444 

11001356 

MIS-MATCH: 

2 

AS 

01012456 

12222222 

MIS-MATCH: 

1 

AO 

01410742 

MIS-MATCH: 

3 


AV 

01431022 

MIS-MATCH: 

0 


&E 

01206532 

MIS-MATCH: 

1 


&R 

03333444 

1 10001 12 

MIS-MATCH: 

4 

AT 

01444422 

04222222 

MIS-MATCH: 

1 

AH 

03444444 

12011444 

MIS-MATCH: 

2 

AE 

01216532 

MIS-MATCH: 

0 


AL 

01244432 

MIS-MATCH: 

0 


AA 

0565421 1 

15543221 

MIS-MATCH: 

4 

AZ 

04223455 

1 1345012 

MIS-MATCH: 

6 

AY 

0454321 1 

1 1444457 

MIS-MATCH: 

7 

AB 

03444444 

15001356 

MIS-MATCH: 

3 

AR 

03344444 

1 1000122 

MIS-MATCH: 

1 

AO 

01420742 

MIS-MATCH: 

2 


AW 

03432110 

15442100 

MIS-MATCH: 

4 

AN 

03444444 

1 1011444 

MIS-MATCH: 

2 

AD 

05653100 

15444432 

MIS-MATCH: 

0 

AO 

01310642 

MIS- MATCH: 

3 


AG 

04654210 

11445011 

MIS-MATCH: 

1 


Figure 18 — TEST on lower-case letters 


Figures 17 and 18 show some test results obtained 
on the author’s symbol set. The displayed or teletype 
output from the TEST mode includes the dictionary 




On-line Recognition of Hand-generated Symbols 411 


entry guessed, the segment descriptors for the input 
symbol^ and the mismatch between the former and the 
latter. In this particular test the phrase “the quick 
fox jumps over the lazy brown dog” was inscribed in 
upper and lower case letters. Out of a total of 70 in- 
scribed symbols, the only error was a lower case x which 
misread as an upper case X. 

CONCLUSION 

The author's dictionary entries (see Figure 13) were 
also used to classify the distorted one and two stroke 
letters of Figure 19. Except for a T which misread as 
a t, all of the characters in this figure were classified 
correctly (the amount of mismatch is shown below each 
symbol). Although previously developed on-line recog- 
nition schemes also are capable of recognizing dis- 
torted symbols, they require the user to provide a 
large number of training samples. The nearest proto- 
type technique described in this paper performs the 
task with a single dictionary entry per symbol. 

The symbol recognizer has been used by many dif- 
ferent people and all have found it enjoyable to operate. 
In one experiment three subjects were asked to con- 
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Figure 19 — Intentionally distorted symbols 


struct personalized dictionaries consisting of the nu- 
merals, the upper case letters, and the lower case letters 
which differed from upper case. Each of these persons 
adjusted to using the tablet and CRT display within 
15 minutes and then took about a minute to make each 
dictionary entry. As the automatic means for setting 
spatial bits has not been implemented, the subjects 
were given brief instructions on the manual procedure. 
The operation itself took about 15 minutes. 

After their dictionaries were constructed the sub- 
jects were asked to write the complete alphabets and 
the phrase “the quick fox jumps over the lazy brown 
dog” in upper and lower case. From this test of 132 
symbols a user typically had two to five misreads. With 
additional experience and very slight refinements of 
dictionaries, all subjects obtained recognition rates 
in excess of 98 percent. An error rate of 5 percent is 
generally considered acceptable in on-line systems, 
because each character can be classified, displayed, and 
corrected immediately by the writer if it is wrong. 

The compiled program for the symbol recognizer 
requires approximately 9K 24-bit words of memory. 
On the average an additional 4 words are required for 
each dictionary entry. Owing to the simplicity of the 
mismatch calculations and the high speed of the SDS 
940 computer, the recognizer can easily accommodate 
normal writing rates of symbols from a set of 100* 

The CVS signature and Lee metric is a fundamental 
technique for measuring the similarity of two arbitrary 
curves, and can be applied to a wide spectrum of 
pattern classification problems. The author is currently 
investigating the usefulness of the method for machine 
recognition of cursive writing in lower case letters. 
Preliminary results from this research are contained 
in Reference 9. 
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Common file organization techniques compared 


by NED CHAPIN 
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Menlo Park, California 


INTRODUCTION 

In order to make a comparison of file organization 
techniques, concurrence is needed on terminology. To 
that end, this introduction offers some definition of 
terms. Unfortunately, many of these terms do not 
have universally accepted definitions. A general defi- 
nition of terms can be found elsewhere. 6 

In offering definitions of terms, this paper does not 
suggest that those who give different definitions are 
wrong. On the contrary, the differences in definition 
that exist reflect in part imperfect communication 
among people in the field, and in part, real differences 
in the concerns of the people in the field. Hopefully, 
papers such as this one will help improve communi- 
cation. But the differences in concern will continue 
to exist* and to spawn both new differences and new 
terms. 

As used in this paper, the term “file organization” 
is not synonymous with file structure, data structure, 
data base, data organization, or data management. A 
file organization is viewed as a way of putting to- 
gether the components of a file, “File structure” is 
viewed as synonymous with file organization, but is not 
used in order to help distinguish it from “data struc- 
ture.” A “data structure” is a more general term than 
file organization^ since a file is viewed as but one general 
organization of data. Some people use the term data 
structure to refer only to vertical relationships among 
data. “Data organization” is viewed as synonymous 
with data structure. A data base is viewed here as a 
group of files or alternatively as a controlled aggre- 
gation of data which can be regarded as organized into 
files. 

The term “data management” is used with a variety 


of meanings in the fields Sometimes it is narrowly used 
to refer to movement and formatting of data to and 
from internal storage, and the supporting software. 
Sometimes in a broader sense it also refers to the 
identification of data and procedures to maintain the 
integrity and security of the data*. At other times, the 
term is used also to refer to file organization. In a very 
broad sense, it refers also to the maintenance of files,, 
the handling of inquiries, and the preparation of 
reports. 

These definitions raise questions about the definition 
of the vertical and horizontal organization of data. 
Looking first vertically, this paper views a file as an 
arbitrary but usually homogeneous but not exhaustive 
aggregation of records. Records are collections of data 
all of which share some attribute in common, usually 
the name of a thing the data are about. For example, a 
record of employee job attendance might contain data 
about number of days worked, number of days absent, 
the usual work station, the parking lot location, the 
home address, the home telephone, the usual days of 
the week absent, and the like. When these data are 
drawn together and grouped in terms of the identifi- 
cation of the employee (such as by employee identifi- 
cation number), the individual groupings thus formed 
are here viewed as records. The components of the 
record are data items (usually fields), as diagrammed in 
Figure 1. 

The definition of a record implies no specific ordering 
of the data items. The definition of the file implies 
no ordering of the records within the file. By ordering 
is meant the application of a collating sequence or 
pattern template to data items at a uniform level in 
the vertical hierarchy of data. When records are ordered, 
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Figure 1- — Condensed diagram of the vertical hierarchy 
of data 


the data items used for the ordering are referred 
to here collectively as the key. For example, the records 
in the attendance file just cited might be ordered using 
an ascending numeric collating sequence with the 
employee identification numbers serving as the key. 

The horizontal organizations of data reflected in 
this paper require definitions of table, tree, string, 
and list. A “table” is a series of pairs of data items, 
which are the argument and the function. The table by 
its form permits the table user to establish by infer- 
ence a relationship between a particular argument and 
its associated function. A telephone book and a state- 
ment of tax rates are examples of tables. 

Three important tables for the comparison of file 
organizations are indexes, directories, and tables of 
contents. An “index” has the arguments in a specific 
order but the function which may consist of multiple 
data items may be in order. By contrast, a “table of 
contents” cites the functions in a specific order but 
leaves the arguments in any order. “Directories” may 
have the arguments or functions or both ordered in 
any manner. For this reason, the term directory serves 
as a general term covering in practice both indexes 
and tables of content. 

A “tree” can be used to represent vertical relation- 
ships among data. 4 A tree may also be used for horizon- 
tal organization of data, as shown in Figure 2. For 


OPERATIONS FILE 



Figure 2 — A partial representation of a tree as a 
horizontal organization for a file 


example data about a firm’s operations might be broken 
into divisions such as production, sales, engineering, 
and the like. These divisions in turn can be broken 
into subdivisions. For example* sales might be broken 
into territories, and production into the product cate- 
gories. Engineering might incorporate new product cate- 
gories currently not in production, as well as those in 
production. These categories can in turn be broken 
still further. Thus in production they might be broken 
by production equipment or in terms of a bill of ma- 
terial. In sales they might be broken down into products 
or into salesmen. In summary the term tree gets its 
name from the graphic representation of the processes 
of subdividing. 

By contrast, a string organization is viewed as a 
series of things, one after the other, where the ele- 
ments composing the series are similar. Examples of 
strings are series of characters, of digits, of names, 
or of numbers. 

A “list” is viewed as a series of records or data 
items, each accompanied by one or more pointers to 
other elements in the series. These pointers are here 
termed “links” and are themselves data items. Some 
people prefer the term “chain” to refer to a list. 

Irrespective of vertical or horizontal aspects of the 
file organization, a file may exhibit a simple or a 
compound organization. A “simple” organization has 
only one major structural pattern. A “compound 
organization has two or more distinct and different 
structural patterns which taken together comprise the 
file organization. 

Classifications 

The number of people in the field have proposed 
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classifications of file organization. A brief review of 
some of these will serve as a basis for selecting one 
for use in making comparison here. 

A team headed by Anthony J. Dowkart has offered 
an extensive basis for comparison. 9 In summary, this 
basis is: the data definition provided, the facilities 
for file creation and maintenance, the retrieval mech- 
anism, the processing procedures, the output charac- 
teristics, and the operating environment. This basis of 
classification is concerned not with file organization 
alone, but also with data management in the broad 
sense. Looking at the matter of file creation and main- 
tenance, and of data definition, the classification bases 
suggested are performance oriented, rather than 
structure or pattern oriented. 

Richard G. Canning has suggested classifying file 
organization into two general classes based upon type 
and upon structure. 3 Within type he proposes recog- 
nizing sequential, indexed, and chained files. Within 
structure, he proposes recognizing linear, hierarchical, 
and involute files. These classifications are more 
structure and pattern oriented than those just cited, 
but they lack a consistently applied, obvious basis. 

Minker and Sable in reviewing data management 
systems suggested a basis of classification as user 
language, file structure, system processing capability, 
and user interface. 13 This again shares the same general 
user basis cited previously. Looking more particularly 
at the basis identified as file structure, Minker and 
Sable suggested classifying on the basis of the imple- 
menting storage media (such as tape or disk) and the 
variety of field and record lengths permitted. Among 
those that permit greater variety and which are disk 
based, Minker and Sable suggested a classification of 
indexed, tree-ordered, and linked, or chained. These 
suggestions share many of the features of those of 
Canning as noted earlier. 

David Lefkovitz has suggested a classification of 
file organization based upon a combination of the hard- 
ware and software components utilized to implement 
the file. 12 These he viewed from a functional point of 
view, particular^ with regard to the retrieval process. 
Thus a file organization may be classified on the basis 
of which software-hardware components it utilizes and 
in what way. For example, does it use a directory, 
does it use a randomizing or a tree approach? If it 
uses a tree approach, does it use a fixed length key or 
a variable length key? And so on. Such a basis of classi- 
fication results in a very large number of possible 
classes. In a sense, each non-identical existing file 
organization becomes a separate classification. 

Ned Chapin has suggested a classification scheme 


based fundamentally upon the way of indicating as- 
sociation at a given vertical level within a file. 4 At 
one extreme he placed the attributed organization 
which provides explicit identification with the data 
at some given level. This obviates the necessity for 
providing a means of association below this level. 
At another extreme, he placed the linked or list organi- 
zation, where each data element at a given level in- 
corporates a specific indication of association. Two 
varieties of this he singled out for particular attention: 
the complex ring which is a complex list that forms 
closed loops, and the muble or multiple double-linked 
list which provides two or more links. At another 
extreme, he placed the hierarchical organization, which 
provides a tree-like association on a horizontal basis. 
Finally, at another extreme, he placed the positional 
organization. This provides association in terms of 
placement in relation to other data, at a given vertical 
level. Thus, field A is always known to precede field 
B, and field B is always known to precede field G, 
and all three fields are always present in a record. 
Hence, values from the third field position have a 
known identification and association. 

The Chapin classification utilizes an important 
feature of the way people think about data, as its 
basis for classification. As such, it avoids the mixed 
base problems inherent in the other classification 
schemes it reviewed, without the gaps or holes charac- 
teristic of the other systems. 

This classification approach lends itself to a graphic 
representation, as diagrammed in Figure 3. The diagram 
uses time as the left to right distance, but not in strict 
scale units. 7 The vertices or nodes are the identity of 
data. The solid arcs or lines are the sequence of the 
active (pointed to) data. Vertically, the diagram has 
two parts, an upper or demand (D) part, and a lower 
or supply part. A perfect match of the file organization 
to the demands upon it occurs when the data (indi- 
cated by broken lines) demanded and supplied occur 
at the same time. 

Characteristics 

The point is well taken that users by and large are 



Figure 3 — A graphic representation of associations 
showing the ideal pattern for a file organization 
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unconcerned with the classification of a particular file 
organization technique. They are concerned with the 
functional characteristics of the file organization tech- 
nique in action. Some of these of course are hard- 
ware and software dependent. But within those bounds, 
they are determined largely by the file organization 
itself. Among the common characteristics are the speed 
and basis of access, the use of storage capacity, the 
ease of maintenance (for insertions, alterations, and 
deletions), and the extent of software support avail- 
able. 

The speed and basis of access is fundamentally 
affected by the association provided in the file organi- 
zation because access uses the association for its reali- 
zation. The hardware, the software, and the associ- 
ation together set the limits. The basis of access may 
be by attribute, by value, or by property as has been 
pointed out elsewhere. 4 

The use of storage capacity reflects two aspects 
of file organization, each of which in turn rests upon 
the basis of association. One aspect is that compound 
organizations commonly use more storage capacity than 
do simple ones. Another is that hardware and software 
factors also affect the use of storage, given the file 
organization. 

The procedures, the convenience, and the time re- 
quired for maintenance operations, such as insertion, 
deletion, and alteration of data in a file, depend obvi- 
ously upon the hardware and software used. But they 
also depend importantly upon the association provided 
by the file organization, since maintenance involves 
access, but is more than access. Common mainte- 
nance practice is not always a corollary of the features 
of the file organization. 

The extent of software support available is a very 
significant determinant of the degree to which people 
are willing to use a file organization. Even if it be 
theoretically attractive, a file organization unsupported 
bv software is in practice ignored in favor of anything 
that is supported by debugged software. 

Common techniques 

Techniques covered 

The most common file organization techniques are 
those proselytized and supported with software by the 
computer vendors. These are normally part of the 
operating system and are accessible to anyone who 
programs in the symbolic language for a particular 
computer. Some of them are available to users of 
higher level languages such as COBOL and FORTRAN. 

Less commonly used are the file organization tech- 


niques supported by software available from the com- 
puter vendors but not provided normally as part of 
the operating systems. These usually take the form of 
“packages” capable of a variety of functions. 

A third category are the file organization techniques 
available in the software market from independent 
suppliers of software. None of these are as common as 
those available in the first category, but some are as 
common as some in the second category. 

For contrast, this paper looks also at the extensions 
to COBOL proposed to CODASYL in the area of file 
organization techniques. 

Vendor supported techniques 

Historically the oldest, the most popular, and by 
far the most common, is the strict sequential file 
organization. The strict sequential is a positional 
organized file commonly consisting of ordered records 
which are themselves positionally organized. 4 - 10 As such, 
its use of storage is the most economical of all. It is a 
simple, not a compound organization. 

The strict sequential enjoys a rapid next-record 
access by attribute, but a slow random access by attri- 
bute, as diagrammed in Figure 4. That is, as long as the 
sequence in which access is demanded conforms to the 
sequence in which the file was sorted, access is rapid 
unless the number of records to be passed over is large. 
Unfortunately, access is often desired on some other 
key. This requires first a reordering of the file which 
involves a time-consuming sorting operation, or an 
exhaustive search of the file. Even with this sorting 
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Figure 4 — Diagram of the strict sequential file 
organization 
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operation, access by value and by property involve 
search. 

Maintenance for sequential files is logically straight- 
forward, but slow. It requires typically a complete 
passage of the file with a complete copying of it. Each 
record must be read and written in order to do mainte- 
nance on the file. Because of this, insertions and dele- 
tions are easily accomplished. Alterations are also sim- 
ple as long as the typical fixed length restrictions on field 
sizes is observed. Where variable length fields are per- 
mitted alterations become a little more complex but are 
still logically straightforward. 

Software support for sequential organization is 
extremely good. Its popularity is attested by Table I. 
It is the most widely supported of all the file organi- 
zation techniques. 

The indexed sequential is a compound file organiza- 
tion technique, historically younger than the strict 
sequential. 4 ' 10 This too is a positional organization. 
The main file is a strict sequential file. With it is a 
sequential organized index using the same key. Some- 
times indexes to indexes are provided depending upon 
the size of the main file and the storage space available. 

Random access for the indexed sequential file is 
superior in speed to the strict sequential because the 
index search requires less time than a search of the 
main file. From the index the location of the desired 
record can be found and the record then accessed with- 
out search. But for a next-record access, the same 
procedure usually is required, which slows such access 
(see Figure 5). Access by attribute, by value, and by 
property follow the same pattern as for the sequential 
organized file. 

The use of storage space for the indexed sequential 
is larger because of the additional space required for 
the indexes. An added inefficiency in the use of storage 
space is the typical requirement for overflow areas to 

TABLE I — Summary of the file organization techniques 

supported by the eight largest computer vendors 


Strict 

Indexed 

Direct or 

Sequential 

Sequential 

Random 

IBM 

IBM 

IBM 

RCA 

RCA 

RCA 

CDC 

CDC 

UNIVAC 

UNIVAC 

UNIVAC 

NCR 

Burroughs 

NCR 

GE 

NCR 

GE 

Honeywell 

Honeywell 

Honeywell 


INDEX 

i t n i r 

Figure 5 — Diagram of the indexed sequential file 
organization 

permit insertions in the main files. This overflow may 
amount to as much as a third to a half more space for 
the main file, although typically this can be held to 
about one -tenth more space. 

The maintenance of the indexed sequential file 
differs considerably from that for strict sequential. 
Maintenance does not require rewriting the entire file; 
only those specific records in the file that are altered 
are rewritten back into their same places. This saving 
in maintenance time can be more than offset by other 
factors. 

An insertion in an indexed sequential file requires 
that adjustments be made to the index and to the 
main file. The inserted record typically must be written 
in the main area displacing a record into the overflow 
area. Links are inserted if more than one such overflow 
occurs in a given area. By contrast, deletion is more 
simple. The record to be deleted is simply marked for 
deletion but is not physically deleted from the file nor 
from the indexes. Periodically, the entire file is re- 
written in order to eliminate the accumulated deletions, 
to pull the insertions into the main sequence, to re- 
apportion the overflow areas, and to clean the index. 
In sum, whether or not the maintenance time for an 
indexed sequential file exceeds that for a strict se- 
quential file depends upon the volume of insertions and 
alterations. For low to moderate volume, the strict 
sequential is usually slower over-all. An indexed se- 
quential suffers from the same single-key limitations 
as the strict sequential. 

The software support for indexed sequential generally 
is good The software operates more slowly per random 
access than for strict sequential because of the de- 
creased buffering possible. 

The direct or random file organization is also a 
positional organization. 4 ' 10 It is like strict sequential 
in that it is simple, not compound. The direct or 
random file organization is a variation of the strict 
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sequential. It uses a transformation of the key. What- 
ever the key would be is passed through an algorithm 
to calculate a position in storage. Because of the 
possible occurrence of multiple records having the same 
key, or of closely spaced keys, provision is made in the 
algorithm to handle some conditions. One is to place or 
find a record when its transformed key is the same as 
another transformed key. This can be handled by links 
and overflow areas, or by shifting records to maintain 
a sequence in order to restrict the search domain. An- 
other is to set up the initial spacing of records in the 
file to permit room for the later insertions. The amount 
of storage space allocated for this purpose is usually 
not less than that allowed for overflow areas in the 
case of an indexed sequential file. 

The random access provided by the direct or random 
file organization is slightly faster than that for an 
indexed sequential organized file, since no index refer- 
ence is needed. But for next-record access, it is slower 
because the transformed key order is not the same as 
the ordinary key. Hence, every access is a random 
access, as diagrammed in Figure fi. The access basis is 
the same as noted earlier for the positional organized 
files. Also, only one key can be used, as noted earlier. 

The use of storage space for the direct or random 
file organization is about as efficient as that for the 
indexed sequential, and is less ; efficient than for the 
strict sequential. This is because of the voids that 
must be left in the spacing of the records to accommo- 
date inserts, and the use of overflow areas. No space 
is needed for an index. 

The maintenance for a direct or random organized 
file resembles the indexed sequential more than the 
strict sequential. This may alsq extend to alterations 
and deletions. For insertions, no index need to be ad- 
justed. If the record to be inserted must go into a place 
that is already occupied (that is, the transformed key 
is a duplicate of an already existing transformed key) 
then provision must be made foif moving records or for 
use of overflow area and links. 



The software support for the direct or random file 
organization is less troublesome and less burdensome 
than that for the indexed sequential. Also, less sup- 
porting software is needed to accomplish the job. The 
user does not even need to rely upon manufacture 
provided software but can make do by providing his 
own algorithm for key transformations and by using a 
strict sequential file organization. Many vendors have 
been supplying this software for a longer period of 
time than they have supplied indexed sequential soft- 
ware. 

Another type of common file organization technique 
available from the computer vendors and incorporated 
as a normal part of their operating systems is the par- 
titioned file organization. 4 - 10 This is a hierarchical file 
organization. But it is normally not accessible to the 
programmer even though it is utilized routinely by the 
operating system for its own functions such as program 
libraries. Typically, the hierarchical file organizations 
are compound because they require directories and 
sometimes even hierarchies of directories to maintain 
association and provide access. These directories usu- 
ally include one that is of the table of contents type. 

Access by attribute is the most common. The speed 
of access depends mostly upon the size and number of 
directories used (see Figure 7). Maintenance is usually 
done by making deletions by altering only the director- 
ies. Insertions are entered in the directories and the 
new data placed in any available space. Alterations 
are often treated as combined deletions and insertions. 

The software support is usually inadequate to enable 
the use of the partitioned file organization by pro- 
grammers in their own programs. The organization 
becomes increasing uneconomically of storage space as 
deletions accumulate. To eliminate them requires re- 
writing the entire file and recreating the directories, 
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Figure 6 — Diagram of the direct or random file 
organization 


Figure 7 — Diagram, of the partitioned file organization 
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TABLE II — Summary of selected vendor 
augmentations 


Strict 

Sequential 


Indexed 

Sequential 


Direct or Other 
Random Techniques 


GIS Gig GIS IDS (ring) 

FORTE i UNIMS FORTE FORTE (list) 

MARS FORTE 

MARS 
UL/1 


an operation equivalent to that needed for the indexed 
sequential file organization. 

Vendor augmentation 

Computer vendors over the years have made a 
number of augmentations and elaborations of the imple- 
mentation of file organizations just compared. The 
best known of these are listed in Table II. 

One of these has been IBM's GIS (Generalized In- 
formation System) . 2 This elaboration provides a number 
of features that add greatly to the power and con- 
venience available to the user. Underlying it are the 
two positional organized file organizations, the strict 
sequential and direct or random. The use of indexed 
sequential is optional depending upon the scope of the 
GIS implemented. GIS is a free-standing package, not 
an extension of COBOL, but GIS can be used with 
COBOL. 

The access for the GIS is slower because of the 
additional software. But that software yields greater 
convenience of user access by reducing programming 
effort to file and retrieve data. The use of storage 
space is but little more extensive, ignoring the space 
for the additional software. Maintenance follows the 
usual procedures but is more convenient from the user's 
point of view because he does not need to write all of 
the programs for doing it. The software support is 
comprehensive. 

The Integrated Data Store (IDS) is available from 
General Electric, 1 and is similar to the General Motors 
Associative Programming Language. IDS offers a com- 
plex ring file organization where the number of links 
possible at the record level in the file may be made as 
extensive as the user desires. In practice, it is used 
most often as an extension of COBOL. 

Access by attribute beyond the first access is slightly 
facilitated because of the links. Access by property is 
much facilitated as a practical matter because of the 
links which provide quick reference to the records with 
related contents. The use of storage space is greater 


than for a strict sequential organization because of the 
space occupied by the links. Since in practice, director- 
ies are used to locate or serve as pointers to rings, a 
little additional storage is also needed for them. 

Although insertions, deletions, and alterations are 
handled by the software, the procedures are considera- 
bly more complicated for IDS than for the positional 
organized file. This is because of the need to adjust 
the links whenever insertions and deletions are made. If 
the insertion cannot be made physically nearby, then 
subsequent accesses following the links are slowed. 
This maintenance problem compounds as the number 
of links to be adjusted increases. The software support 
available for IDS is comprehensive and has been ex- 
tensively tested in use. 

The UNIMS (Univac Information Management 
System) is available from the Univac Division of 
Sperry Rand. It offers a modified indexed sequential 
file organization in a package of software, in a similar 
manner to that noted earlier for GIS and IDS. It too 
can serve as an extension to COBOL. 

The access and maintenance for UNIMS are similar 
in character to that noted earlier for indexed sequential 
files. But to the user the procedures appear easier 
because of the assistance provided by the software. 
UNIMS uses little more storage space than the in- 
dexed sequential noted earlier. The software support 
is comprehensive. 

The UL/1 (User Language/ 1) from RCA offers a 
more convenient language for the handling of access, 
maintenance, and reports from files than the usual 
programming languages. As such it has similar ob- 
jectives to GIS noted earlier. UL/1 uses a modified 
indexed sequential file organization in a way that gives 
the appearance of a hierarchical file organization. 11 The 
characteristics of this software system were still fluid 
at the time of this paper. 

FORTE is available from Burroughs Corporation. 
It provides unordered (sequential), indexed sequential, 
random,, and a combination of indexed sequential and 
random. Further, it provides list file organization in 
two forms, a two-cell list, and a usual double-linked 
list (but not a multiple-linked list or ring structure). 4 ’ 14 
As such it represents an improvement over the FORGE 
software which Burroughs has offered. FORTE is de- 
signed for use as an extension of COBOL, not as a 
free standing software package for file organization 
and use. 

Another relatively new entry in the field is MARS 
from CDC. In giving the user the appearance of a 
range of file organizations, it like UL/ 1 relies primarily 
upon the strict sequential and indexed sequential file 
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organizations. Like GIS noted earlier, MARS is a 
generalized system providing access, maintenance, and 
report capabilities. It does however provide the capa- 
bility of building an inverted list organization. Its 
characteristics were still fluid at the time of preparing 
this paper. 

Non-vendor augmentation 

The number of implementations of file organization 
alternatives are available in the software market from 
sources other than computer vendors. With IBM’s 
Summer 1969 announced changes in software policy, 
this growth in alternatives can be expected to grow still 
larger. Only a brief selection is covered here, based 
primarily on age and popularity (see Table III) . 

Two distinct classes of offering are available in the 
software market. One uses and elaborates upon the 
vendor provided file organization and software support. 
Another replaces the vendor provided file organization 
and hence also provides its own software. A brief look 
at each of the groups will round out the comparison, 
since these offerings may soon become more popular 
in the market. 

In the first group, some of the best known are the 
MARK -IV, the FILE EX, SCORE-II, and INQUIRE. 
The first two of these use the vendor-provided strict 
sequential and indexed sequential file organization tech- 
niques. To these they add an important software super- 
structure for report preparation, data retrieval, and file 
maintenance. As such they provide an alternative to 
the user for preparing his own programs to accomplish 
similar ends, and to the use of the vendor-provided 
software. 

The SCORE-II also uses the vendor-supported se- 
quential and an indexed sequential file organization. In 
addition it also provides tree structure, not directly 
but based upon a combination of the strict sequential 
and indexed sequential. This adds flexibility to the 
package of report preparation, retrieval, and mainte- 

TABLE III — Summary of selected non- vendor 



augmentations 


Strict 

Indexed 

Director 

Other 

Sequential 

Sequential 

Random 


MARK-IV 

MARK-IV 


DM-5 

(hierarchy) 

FILE EX 

FILE EX 


SCORE-II 

(tree) 

SCORE-II 

SCORE-II 


INQUIRE 

(list) 


nance facilities. 

Differing in its choice of the underlying file organi- 
zation is INQUIRE. This utilizes the indexed sequen- 
tial and the direct or random file organizations. But 
these are not directly accessible to the programmer. 
Rather, INQUIRE combines them to form a modifi- 
cation of an inverted list file structure.* This gives 
added power to the file maintenance, retrieval, and 
report capabilities of INQUIRE. Access by attribute 
and by property is facilitated by the inverted list 
organization, but maintenance requires adjustment of 
the lists as additional operations. 4 

In the second group, the oldest and most publicized 
entry is the DM-5 (Data Manager-5) which has been 
described in the literature of the field. 8 DM-5 ,like the 
others, includes the software for retrieval, maintenance, 
and report preparation. DM-5 utilizes a hierarchical 
file organization of a compound form. Tables are used 
at several levels. Both random and next-record access 
is handled by use of the tables, and are of about equal 
speed for access by attribute. Since the records are 
not ordered by a key, but many keys can be used in the 
construction of the tables, the single key restriction of 
the positional file organization is avoided with a result 
similar to that for the inverted list file organization. 

In summary, the non -vendor offerings in the soft- 
ware market typically combine into a single package 
both file organization and convenient aids to using it. 
The offerings thus far do not attempt to replace the 
file organizations supported by the computer vendor's. 

COBOL extensions 

The Data Base Task Group proposed last year to 


* The inverted list was developed about 1964 under the leader- 
ship of Dr. Jack Minker as a modification of the inverted file. 
The inverted file organization was in use in the information 
retrieval field in the years 1957-1958. The inverted file is a posi- 
tional file organization with an ordering determined by multiple 
keys. Records in the file reoccur as many times as they may have 
keys, which need not be the same from record to record. By 
contrast, an inverted list is a list file organization of a compound 
form. The main portion of the file need not be and usually is not 
in a list form. The key portion of the file is organized as a set of 
lists consisting of pointers for each key to records in the main 
file. Since as a practical matter, the links are unnecessary, com- 
mon practice is to elide them. The result is conceptual^ equiva- 
lent to an inverted file with all records replaced by surrogates 
(a common practice now), and with the records drawn into a 
subfile of their own with no redundancy. (The inverted list can 
also be viewed as resulting from a consolidation of the links in 
one direction from a muble chain or multilist file. 4 ’ 12 ) In net 
effect in their modern forms, and as a practical matter, an in- 
verted list differs from an inverted file primarily in emphasis and 
manner of implementation. 
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the CODASYL COBOL Committee an extension of 
COBOL to incorporate provisions for the complex ring 
file organization. 6 Although the discussion devotes con- 
siderable attention to the other file organization tech- 
niques, the proposal is for the inclusion of only one of 
them, the complex ring. In substance, this is very 
similar to the IDS noted earlier. This discussion in- 
cluded with the proposal indicates that ring file organ- 
ization can be used to simulate or serve as other file 
organizations, such as sequential, random, hierarchical 
or tree, and inverted file. Although not presented in 
the discussion, it can also be used as for muble chains 
or a multilist file organization. 

One of the major objectives of the Data Base Task 
Group was to work toward keeping the description of 
data stored with the data itself. This is in effect, an 
attempt to delay binding time. Since delayed binding 
time in general improves the flexibility and power of 
the resources available to the programmer, the ob- 
jective is commendable. Providing linkage among data 
can be a definite step in this direction. The question 
to be argued is whether or not the ring file organization 
is the best choice of means for accomplishing this 
objective as well as serving as a worthwhile extension 
of COBOL. 

From the comparisons presented, it can be argued 
that replacing a ring file organization by a frankly 
compound file organization sans links, would gain more 
for COBOL. Examples of candidate file organizations 
are the inverted list and the hierarchical. Access for 
both is faster and more powerful; maintenance for 
both is simpler. 

CONCLUSION 

Automatic computers during the middle and late 
1950’s had by present day standards, relatively slow 
execution times and great restrictions upon the availa- 
bility of both internal and external storage. The trend 
has been toward increasing the availability of larger 
and larger amounts of storage capacity, and toward 
faster and faster operating speeds. 

These changing computer capabilities suggest the 
desirability of seriously rethinking the historic prefer- 
ence for positional organized files. This was certainly an 
appropriate choice of file organization technique, when 
storage capacity was extremely limited and operating 
speed was slow. It required the least storage space and 
the least direct overhead within the program at the 
time of file use. The positional organized file entails a 
very heavy cost of additional operating time in order 
to reorder (sort) the file. It also involves the time to 
rewrite the file periodically as a part of the mainte- 


nance of the file, depending for its extent upon the form 
of the positional file organization. 

Now that computers have much more extensive ex- 
ternal and internal storage capacity and operate more 
rapidly, it appears appropriate to reappraise our con- 
tinued reliance upon positional file organization tech- 
niques. Let us consider briefly the alternatives. The 
attributed file organization is still too expensive of 
storage space and of machine time for serious attention 
in pure form. The list file organizations in general 
suffer from costly maintenance. The exception is the 
inverted list. The hierarchical file organizations appear 
attractive, but like the inverted list, are in practice 
compound file organizations. 

It is significant that these latter two file organization 
techniques are generally not available to computer 
users because the supporting software is not generally 
available. The software exists, but the form of most 
puts it beyond the reach or scope of operations for 
most computer users. But this gap is narrower now 
than it was. Some vendors such as CDC and Burroughs 
have started to move to provide a wider range of file 
organization, techniques. Independent software firms 
are starting to offer a wider variety of alternatives. 
But a gap still exists. 
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An information retrieval system based 
on superimposed coding* 

by JOHN R. FILES and HARRY D. HUSKEY 

University of Calijornia 
Santa Cruz, California 


The cost of storing information in machine-accessible 
form has declined markedly in the last decade, and 
promises are such that one can look forward to having 
complete libraries available in such form. This places 
increased importance on algorithms which make it 
possible to search large files efficiently. 

This paper describes an approach to this problem. 

In practice, information in a large file can be more 
efficiently accessed if it is indexed in some manner. The 
method of indexing which will be discussed is par- 
ticularly well suited for a file which : 

1. Is very dynamic with both deletions and ad- 
ditions frequently occurring. 

2. Contains an extensive vocabulary which is to 
be encoded. 

Both of these characteristics are frequently found in 
files that are to be coded. A file of information on re- 
cently published articles about a given subject and a 
card catalogue for a large library are good examples 
of files which require a large amount of maintenance. 
If updating the index (code file) is expensive and time- 
consuming, updating is put off until it is felt that the 
performance of the system has deteriorated enough to 
justify the effort required to update it. Until the up- 
dating takes place, information which is no longer of 
use is still retrieved, and the new information, if present, 


* The research reported on here was done at the University of 
California at Santa Cruz with the partial support from Project 
Genie at the University of California at Berkeley (Contract 
SD-185 with The Advanced Research Projects Agency of the 
Department of Defense). 


is in a secondary file. Keeping a secondary file con- 
taining recent additions avoids the serious problem of 
not having new material available, but it does decrease 
the efficiency of the system since such a file must be 
searched separately each time an inquiry is made of 
the main file. 

The ability to utilize an extensive vocabulary is 
also very important. In the proposed system the vo- 
cabulary to be used is derived directly from words 
used in the original documents, thereby eliminating 
the time-consuming and expensive practice of manually 
abstracting and choosing indexing terms. Machine- 
generated derivatives of the original vocabulary retain 
more information about the original content of the 
item than does the manual system of assigning de- 
scriptors. In the manual case when selected descriptors 
are assigned to a document, associations of descriptors 
to words and to phrases are made. Such associations 
are not made in exactly the same manner by two 
trained indexers, and it is likely that the associations 
made by the average interrogator of an information 
retrieval system will be even more diverse. Because of 
this lack of uniformity in assigning descriptors it is 
desirable to allow each searcher to determine words 
and phrases that he wishes to associate with the concept 
on which he is doing a search. Postponing such asso- 
ciations until the time of the search can be accomplished 
only if the entire word content is preserved in the 
coded form. 

E &se of update and freedom o f vocabulary are not 
enough in themselves to make a coding algorithm 
worthwhile. Factors such as speed of access, ability 
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to make searches for combinations of words and com- 
pactness of code file are also important considerations. 
All of these characteristics will be discussed for the 
coding scheme outlined below. 

The system 

The information retrieval system which was in- 
vestigated can be divided into three components: 
preparation of the text, generation of the code file, 
and the searching procedure. A general outline of the 
first two components can be seen in Figure 1. 

Since the form and format of the text to be used 
can be expected to vary greatly, the text is standard- 
ized as it is read in. Mags are set to indicate boundaries 
between records as well as at the ends of lines to make 
it easier to reproduce the document when it is retrieved. 
Also, as a measure to reduce the bulk of the file gener- 
ated (text file) extra blanks in the input text are re- 
moved. In the pilot system the text file was generated 
from two sources: a bibliography of computer science 
and a listing of authors and titles from recent issues of 
The Computer Group News of the IEEE. Both of these 
texts were read, processed, and stored on a disk. The 
text file generated was 100,000 characters stored one 
character per byte. 

Once the text file is generated coding can proceed. 
The text file is examined character by character until 
the end of a string which is to be coded (word) is en- 
countered. The unit coded is a string of at least three 
alphabetic characters surrounded! by non-alphabetic 
symbols (an English word). After the word is found it 
is compared with a list of non-coiitent words, (i.e, the 
Delete List containing words such as: of, the, and and). 
If the word is found in the Delete List there is no further 
processing of that word, and the next word is con- 
sidered. 

When a word is found that i$ not in the Delete 
List, the trimming algorithm is applied to reduce the 
word to a pseudo-root. Common! endings such as s, 
ed, ing and compound endings such as fully (a? in 
carefully) are removed. By removing endings, different 
forms of the same word are made! into synonyms. For 
example, the words ‘computer’ and ‘computers’ will 
both be reduced to the base ‘coinput.’ This derived 
root is then passed on to the coding procedure. (Further 
discussion of trimming algorithm ip Appendix C.) 

In the coding procedure, a code wojrd is generated for 
each record. The code word can be thought of as a 
bit string containing N bits, all of ,which are initialized 
to zero at the beginning of the coding operation. When 
a trimmed word is to be coded into the code word, the 
numeric value of the letters in the word is summed, 



Figure 1 — Coding procedure 


giving a number which is used to choose an element 
from the uniform distribution of integers between 1 
and N. Thus the resultant integer (code value of the 
word) is generated by an algorithm which given the 
same trimmed word in the future will generate the 
identical code value for that word. By using a fixed 
arithmetic procedure to produce the code value for a 
word, the need for a dictionary of words and assigned 
code values disappears. This frees the large amount of 
storage which such a dictionary would occupy as well 
as saving the time required to search such a file. If 
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9 

SUPERIMPOS(ed) 

22151 

7 

COD(ing) 

87008 

3 


CODE WORD 00010Q01010001Q1 

0 15 

Figure 0 — Coding “An information retrieval system 
based on superimposed coding” 

for a particular word the code value generated is K, 
then the K'th bit in the code word is set to one. (Figure 0) . 

The entire operation of finding a word, checking the 
Delete List to see if it should not be coded, trimming 
and coding is repeated until the entire record is pro- 
cessed. The code word which is uniquely determined 
by the words in the record is then stored in a file (code 
file) along with a pointer to the beginning of the record 
in the text file. This procedure is repeated until all 
the records have been coded. 

After coding, the file is ready for searching. The 
searching program accepts any number of words, each 
of which is processed in the same manner as the words 
in the text file. It is looked for in the Delete List, 
trimmed, and used to generate a code value. This code 
value is then used to produce a query code in exactly 
the same way as the code words were produced in 
the code file. Upon generation of the query code the 
actual search may begin. Each code word in the code 
file is matched against the query code to see if the 
query code is a subset of it. (Here a bit string X is 
said to be a subset of another, Y, if when the I’th bit 
in X is one, the Lth bit in Y is also one, i.e., 1010 is a 
subset of 1011 while 0101 is not.) Each time that the 
query code is a subset of the code word, the pointer to 
the text file is used to gain access to the corresponding 
record which can be further processed to see not only 
if it contains the relevant words, but that the words 
are in the correct order. 

The above is a brief description of the coding sug- 
gested for a file of an information scanning program. 
Some details such as the exact procedure for removing 
endings and the use of several independently generated 
code values to produce multiple code words for a given 
record, were not dealt with here. A more detailed 
treatment of these problems can be found in the ap- 
pendix. 


Vocabulary Occurrence Text 

Pile Pile Pile 



Figure 2 — Inverted file 


Results 

From the pilot system, data was gained on the per- 
formance of such a system of superimposed coding. 
When possible, the performance of the superimposed 
coding system will be compared with that of a threaded 
list and inverted file. (Figures 2 and 3) The following 
factors received major consideration: 

1. Ease of update 

2. Effect of a large vocabulary 

3. Amount and type of storage 

4. Speed of search 

5. Cost 

Before making any comparisons it would be best to 
give a brief description of threaded lists and inverted 
files. An inverted file consists of two main parts, a 
vocabulary file and an occurrence file. As records are 
processed, each significant word is looked up in the 
vocabulary file. If the word has appeared before, it 
has associated with it a pointer to an area in the oc- 
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currence file; if not, then an area in the occurrence file 
is set aside for the word and a pointer to the first 
location in that area is entered in the vocabulary file. 
After this pointer is found, an entry is made in the 
first free location in the corresponding area of the 
occurrence file to indicate the record in which the 
word occurred. 

The threaded list on the other hand, has the same 
type of vocabulary file, but the occurrence file is ar- 
ranged in a different manner. The pointer in the vo- 
cabulary file now indicates a location associated with 
the first record containing the ; given word. This lo- 
cation in the occurrence file, in turn, contains a pointer 
to another location in the occurcence file associated 
with the second record which contains the word, and 
the pointer in this location points. . . Thus a linked 
list of all the occurrences of the word is formed. 2 

1. Ease of update 

In the proposed system a record can be added or 
deleted very easily. To delete a record a search is 
performed which will retrieve the desired document. 
This produces not only the pointer to the record in 


the text file but the location of the record’s code in 
the code file. The code word and pointer are removed 
from the code file, and their location is recorded as 
being free to be used for a new entry to the code file. 
The space that the text was occupying in the text file 
is now also free to contain new text. In order to add a 
record, which is the more common situation, the text 
of the new record is added to the text file in the first 
free location of a suitable size or at the end. It is then 
processed in the same manner as all the other records 
have been. The generated code word and pointer is 
inserted in the first free space in the code list. Here 
no room is wasted since all of the code word and pointer 
combinations are of the same length. Thus any type 
of update in the code file will affect only the code for 
the record which is being changed. 

The threaded list can be updated with slightly more 
effort. The problem, and a minor one, is that the 
records in the occurrence file are not all of the same 
length, making it necessary to see if there is enough 
room in a given free area to insert the new entry. 

The inverted file on the other hand is far more diffi- 
cult to update than either of the others. If a record is 
to be removed all that need be done is to delete all 
pointers to it in the occurrence file. The addition of a 
record however becomes a serious problem. If for every 
word in the record there is room for an additional 
pointer in the areas set aside for pointers to records 
containing that word, then the update is easy. But if 
there is no room, a secondary file must be set up. The 
number of such files will grow until it is felt that a 
thorough update should be made. Then the entire text 
file must be re-inverted to produce a new vocabulary 
and occurrence file. This is a very time-consuraing and 
expensive project. 

2. Effect of a large vocabulary 

With the superimposed coding there is no problem 
associated with having an arbitrarily large vocabulary. 
This is true because the superimposed coding does not 
require a table of vocabulary words like the inverted 
and threaded list files do. Since the vocabulary file is 
not present and does not have to be searched, increasing 
the vocabulary neither lengthens the time reqruired for 
a search nor increases the amount of storage required 
to contain the coded information. 

3 . Storage requirements 

The major advantage of superimposed coding lies in 
the great economy of storage. In the pilot program 
which was run, a text file of 100,000 bytes was used to 
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produce a code file requiring 3,000 bytes. This reduction 
of 30 to 1 from the text to the code file is far better 
than the ratio obtained with the threaded list and 
inverted files. Such reductions are largest with small 
files such as the one experimented with, but substantial 
reductions do exist even in larger files. For example, 
assume that the text file consisted of 10,000,000 bibli- 
ographic entries, each containing 12 words which will 
be coded. Such an author-title entry was found to have 
roughly 300 characters in it, implying that the text 
file would be roughly 3 X 10 9 characters. Also assume 
that an average search contains at least three signifi- 
cant words. Such an assumption is made on the grounds 
that a search based on fewer words would tend to return 
more titles than would be of interest due to the very 
large size of the bibliography. From these two assump- 
tions, utilizing considerations explained in Appendix B, 
it is found that the code file would consist of seven 
code words and one pointer for each record. Each of 
the code words is produced in a manner similar to the 
single code word mentioned before. Now, however, 
once the trimmed form of the word is found seven 
different procedures are applied to produce the pseudo- 
random number between 1 and N for each of the seven 
code words. Each of the code words will have 24 bits 
and the pointer will have 32 bits, thus indicating that 
each record will produce 25 bytes of code in the code 
file. The total size of the code file would then be 2.5 
X 10 8 bytes, which still is a reduction of better than 
10 to 1. 

Such a reduction is far out of reach of an inverted 
file since each record in the text would have to have 
twelve 24 bit pointers pointing to it, and one 32 bit 
pointer from the record to the starting position of that 
record in the text file. This requires a totalof4Xl0 8 
bytes and indicates only a portion of the room taken 
up by the inverted file. It does not include the vocabu- 
lary file which would be substantial, nor does it en- 
compass the overhead of the occurrence file consisting 
of markers for the boundary between lists of pointers 
for a given word. Also it ignores the room which must 
be set aside for a linking pointer in case a new occur- 
rence is to be added. 

An additional advantage of the superimposed coding 
lies in the type of storage which can be used to store 
the code file. Since the file will be searched serially 
the storage media need not be random access. This 
permits the use of a cheaper sequential access storage 
device such as magnetic tape, which could greatly 
decrease the cost of such a system. 


4. Speed of search 

Evaluating the speed of a search using superimposed 
coding is difficult since the speed of any implemented 
system depends heavily on the characteristics of the 
storage media containing the code file as well as on 
the obvious consideration of the size of the text file. 
The search can be performed by reading the code file 
from bulk storage into addressable memory and com- 
parison of the query codes with code words made by 
software. If this is done then the time required to 
search the code file can be cut to less than 6X(the 
memory cycle time of the machine) X (the total number 
of code words in the code file). This speed can be- 
achieved due to the simplicity of the comparison which 
the software must make. The program only needs to 
test if X is a subtset of Y by loading the accumulator 
with Y, doing a logical ANI> of the accumulator with 
a register which contains X, and testing to see if the 
accumulator equals X. When large text files are used, 
and there are several independently assigned code words 
for each record, time is saved by being able to reject a 
record when any one of the query codes fails to be a 
subset of the corresponding code word. By taking 
advantage of this a substantial amount of time can 
be saved. In the previously mentioned large file, with 
seven code words for each record and an average 
search of three words, more than 90 percent of the 
records would be rejected after only the first com- 
parison was made. This means that there would be 
36 memory cycle times (the time allotted for the six 
comparisons which did not have to be made) free to 
take care of the overhead in the searching program. 

Even with this simple and fast searching procedure, 
a search does require longer than the threaded list or 
inverted file. Although the implementation of this tech- 
nique in software is slower, there are several methods 
that radically reduce the amount of time required to 
search the code file. 

Since the algorithm for searching the code file is 
simple, the actual testing to see if X is a subset of Y 
can be done with very simple hardware. If the Fth 
bit of X is 1 and the Fth bit of Y is 0 for any of the 
values of I from 0 through 7, then X is not a subset of 
Y and the value of Z will be 1. If in no case is bit I 
of Y=0 and bit I of X=l, then X is a subset of Y 
and Z is 0. 

Considering the speed of present day circuitry the 
time required to search a code file would be reduced 
to the time required to transfer the data from bulk 
storage. Since the hardware is so simple, it is practical 
to scan data from several sources simultaneously. An 
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Figure 4 — Hardware to test if X is a subset of Y 

Z = (YoAXo)V(YiAXji) V. . .V(Y 7 AX 7 V , 


alternative to having the file searched externally would 
be to wire into read only memory the commands to 
test for a subset. By adding instructions to use the 
next code word and repeat the operation if the test 
fails, the search will proceed through core memory at 
a rapid rate making only one core access for each test. 
The end of the list of code words can be marked by a 
code word containing all ones. This has any possible 
query as a subset and would assure that the loop 
was interrupted at that point. 

A second technique which would reduce the time 
required to search the file is to sort it in some manner. 
One such method which generates a superimposed 8 
bit code from a 24 bit code is discussed in Appendix A. 
Other methods such as carefully dividing the code file 
into small groups and then doing a logical OR of the 
chosen code words to form rejector vectors have been 
suggested. 4 

In comparing the speed of the search it should be 
noted that with superimposed coding and wdien search- 
ing for several words, the search for all of the words 
is carried out at once. In the threaded list and inverted 
file a search for several words is made by making a 
list of occurrences for each word and then finding the 


intersection of the lists. Due to this parallelism of the 
search superimposed coding can handle a multiple word 
search in a more efficient manner than the other two 
methods. 

At first glance it appeared that searching the entire 
code file would preclude the use of superimposed coding 
on a large file. With more careful examination, however, 
it is apparent that this type of code file can be searched 
as rapidly as either the threaded list or the faster 
inverted file. 

Factors which lead to this conclusion include: 

A. The code file search can easily be implemented 
in hardware. Such hardware is simple and 
very fast as well as being able to handle 
several streams of data simultaneously. 

B. If several sequential access devices or a 
random access storage device is used then the 
code file may be structured to allow large 
blocks of the code file to be rejected with 
only one test. 

C. The superimposed coded file is much more 
efficient at handling searches for records con- 
taining several desired keys. 

5. Cost 

The cost of implementing an information retrieval 
system utilizing the type of superimposed coding sug- 
gested would be substantially less than the cost of 
implementing a threaded list or inverted file using the 
same text file. The reasons for this stem from the 
reduced requirement for computational capability of 
the computer, as well as a substantial reduction in 
the amount of storage required for the coded infor- 
mation. 

All three systems must dedicate a large amount of 
storage to the actual text. This, in all of the cases, 
can be either directly accessible to the computer such 
as a large disk file, or may be only machine referable 
such as a machine controllable microfilm display, like 
the proposed system at the University of California, 
Santa Cruz or the one being used as part of Project 
Intrex at M.I.T. 6 The difference of storage cost is not 
found in the storage of the text file but in the com- 
parison of the cost of the storage of the code file of 
the superimposed coding system with the cost of storing 
the vocabulary and occurrence files of the threaded 
list and inverted file. The code file is smaller and can 
be stored in a sequential access device rather than a 
random access device. Both of these factors tend to 
reduce the cost of the system. 

If scanning of the code file is implemented in hard- 
ware then the requirements on the computer become 
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very small. All that it is responsible for is processing 
the words in the inquiry in order to generate the query 
codes, and then, while the search is in progress, stand 
by to store the pointers to the text file which the one 
or, possibly several, hardware scanners pass to it. 

The trial program which processed the questions 
generated the query codes and handled the searching 
in software, was substantially under 16,000 bytes of 
code on an IBM 1130 with no overlaying. Thus the 
requirement for expensive core storage is low. The 
cost of the hardware which would do the testing for 
the query code being a subset of the code word and 
its interfacing with the computer would be very small 
compared to the cost of the necessary storage devices. 

One phenomenon which is found in the superimposed 
coding and not in some other forms of coding is the 
presence of spurious matches. These occur because, 
in a given code word the fact that the Tth bit is zero 
signifies that any word assigned the code value I is 
not in the record. The converse is not true. Since many 
vocabulary words could cause the Tth bit to be one, 
the rth 'bit being equal to one, does not indicate that 
a specific word is present. By generating several in- 
dependent code words for each record the number of 
times that superimposing will cause an irrelevant 
record to be retrieved can be made arbitrarily small. 
Take for example the case where twelve words were 
coded into seven 24 bit code words. In that case the 
probability that a record in which all seven of the 
query codes for a question were a subset of the code 
words, and none of the three words involved in the 
search w T ere in the given record, was 3 x 10 ~ 10 . (See 
formula in Appendix B, bd = .35, cw = 2.8, qc = 7) 

Since the number of such spurious matches can be 
limited to any desired extent, although not entirely 
eliminated, it is convenient to perform some final 
verifying operation to assure that the words specified 
in the search are actually present. This verification in 
the case of the pilot program was accomplished as a 
side result of the check to see that the desired words 
occurred in the specified order. Consequently there 
was no penalty in making this extra check on the 
records which were retrieved. 

The requirement that additional checking be done 
is not an unreasonable one. The fact that a document 
contains the words in which one is interested does 
not necessarily indicate that the document is of interest. 
Therefore any key word searching procedure can only 
be the first step of an information retrieval system. 
The job of a key word search is to quickly reject 
records that do not contain information of interest. 
In this sense any of the three types of key word infor- 


mation retrieval systems which have been mentioned 
are more properly information screening procedures 
which can rapidly eliminate a large portion of the 
text file as unlikely to contain relevant information. 
Such a system should be used to identify those records 
which warrant further and more extensive examination. 

CONCLUSION 

i 

The method of superimposed coding which has been 
discussed is a simple and relatively inexpensive manner 
of scanning a large text file. With a simple check for 
spurious matches made after the search, such a system 
can stand alone as a key word information retrieval 
system. On the other hand since the actual scanning 
of the text can be easily and rapidly handled by pe- 
ripheral hardware, the method is very attractive as a 
first stage screening method. Although the prospect of 
having to search the entire code file for every inquiry, 
at first glance, appears discouraging, the simplicity of 
the scanning algorithm and the qase with which searches 
can be carried out in parallel makes such a linear 
search very reasonable. 

APPENDIX A 

Besides implementation in hardware, measures can 
be taken to eliminate the need for searching the entire 
code file, thus reducing the required search time. One 
manner of doing this is to use the first code word of 
each record to generate a shortened code word for it. 
In the case of a 24 bit code word, the first bit of the 
of the second level code word is the logical OR of the 
first three bits of the first level code word. Bits 4 
through 6 could also be ORed and used as the second 
bit of the second level code word. Continuing this 
process an 8 bit second level code word is produced 
based on the bits 1 through 24 of the original code 
word. Since there are only 256 of these second level 
codes possible, with each record’s first code word being 
mapped into one and only one of these classes, the file 
is partitioned into 256 sets characterized by the numbers 
0 through 255. When it is time to search the code file, 
the element of the partition that the first query code 
belongs to is determined. If for example the query 
code is OOOlOOOOOOlQOOOWlOOOOm it would belong to 
set 84 (01010100). The only sets which would have to 
be searched would be those characterized by numbers 
which have 84 as a subset (i.e., 11111111, 11111110, 
11111100 would have to be searched, but 11111011 
would not have to be examined further). There would 
be only 32 out of the 256 sets which would have to be 
searched, thus the number of code words which would 
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hme to be compared with the query codes would be 
reduced. Using the scheme of coding 12 words into 24 
bits would cause roughly 10 percent of the code file to 
be classified as 255 (11111111) and just over 3 percent 
to be classified by a number whose binary representation 
contains 7 ones and one zero. Due to the non- uniform 
distribution of the code words over the 256 sets, the 
reduction in the amount of the code file to be searched 
would not be the 7/8 suggested by the reduction in 
the number of sets which must be searched. The re- 
duction would, however, be in the neighborhood of 
30 percent (3/8 of the sets whose binary representation 
has sevens one and one zero and! 18/28 of those with 
six ones and two zeros can be eliminated). 

APPENDIX B 

Since care was taken to assign the code values using 
numbers from a uniform distribution, the expected 
number of spurious matches can be predicted. By 
varying the length and number of the code words the 
frequency of spurious matches can be controlled. The 
number of spurious matches is a function of the bit 
density, bd (i.e., the number of ones in the code word 
divided by the number of bits in the code word) ; the 
number of code words per record, cw; the number of 
ones in the query code, qc; and the number of records 
which are coded into the code file N. 

The expected number of spurious matches = 

Nx(bd) cw -«- c - 

The number of bits used to code one record = 
cw (the number of bits in the bode word) 

By keeping the number of bits used and the number of 
ones in a code word constant in the above two equations, 
it is found that the minimum number of spurious 
matches occurs when the number of bits in the code 
word is e times the number of ones in the code word. 
That is when the bit density is 1/e. The number of 
bits B to use for the code word when there are M 
words to be coded in each record is roughly 2.2M. 
This is found by considering that: the probability that 
a given position will be left blank is (1-1/B )M. The 
expected bit density would then be 1-(1-1/B M). 
Setting this equal to the 1/e and solving for B yields 
the desired results. 3 

APPENDIX C 

The trimming program was divided into three 
sections. The first step removes all Vs, ‘d*s and Vs 
from the end of the word. These letters were removed 
since there are many words such as ‘ attractions 7 which 
have compound endings terminating in s, es, d, and 


ed. By removing these letters, in the above, the suffix, 
‘tion*, is left on the end of the word where it can be 
easily identified and removed in a later section of the 
program. Once this operation is completed the endings 
‘er* then ‘ly* and then ‘al* are searched for and removed 
if found. This procedure removes endings such as the 
‘ally’ on the end of ‘functionally* and again is a tech- 
nique to handle compound endings. 

After the above two trimmings have been accomp- 
lished, the Trim list is consulted. Suffixes found in the 
Trim List are arranged in order by length, starting 
with the longest. The ending found in the list is com- 
pared letter by letter with corresponding letters on 
the end of the word remaining after the first two trim- 
ming stages have been completed. Since all of the Vs, 
Vs and *d*s have been removed, the suffixes are in an 
unusual form. For example, ‘ness* would have been 
trimmed to ‘n* by the first stage of the trimming 
procedure. Also ‘ance* appears as ‘anc* in the Trim 
List. 

The reason for having suffixes in this form can be 
seen by considering the problem of trimming the two 
words ‘finance* and ‘financed*. In the second case, 
when the ‘ed* is found on the end of the word, it is 
difficult to decide if the ‘ed* or just the ‘d* should be 
removed. The decision was made to remove the ‘ed*. 
This means that to trim ‘financed*, ‘anc* must be in 
the Trim List. However, ‘finance* which should be re- 
duced to the same pseudo-root requires either the 
ending ‘ance* to appear in the list or the ‘e* removed 
before the ending is compared with endings in the 
Trim List. The second course of action was chosen 
because it reduces the length of the Trim List and makes 
the first step of the trimming operation very simple. 

The comparison of the endings in the Trim. List is 
continued until either the list is exhausted or a match 
is found and the ending removed. There are two more 
checks to be made on the trimmed word. First, the 
last two letters of the word are compared. If they 
are the same, then the last letter is removed. This is 
are the same, then the last letter is removed. This 
done so that a word such as ‘trimming* will be cut 
back to ‘trim*. First the ‘ing* is removed to give ‘trimm, 
and then the second ‘m* removed to give the desired 
root. 

The final action provides some protection against 
trimming words too severely.Tlie word ‘deeds* would 
be trimmed to nothing. To prevent such loss of in- 
formation, any word which has been reduced to less 
than three letters is restored to a length of three. At 
this point the word is considered trimmed. 

There is one major problem which occurs with the 
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use of a trimming algorithm. Words which do not 
convey the same meaning can be reduced to the same 
root. An example would be that both ‘information’ 
and ‘informal’ are reduced to ‘inform’. Such a result 
may be undesirable; it is unlikely that when searching 
for one of the words, the other would be of interest. 
Unfortunately the effect of this type of false retrieval 
could not be observed in the small pilot program. 
Such confusion of terms was rare due to the specialized 
nature of the text. In a system utilizing a larger text 
file containing a more generalized vocabulary, the 
number of such erroneous replies may become sub- 
stantial. If a system utilizing a trimmed form of the 
vocabulary words is used for the first stage of an infor- 
mation retrieval system, the problem of such extra 
records is not a serious one, since the purpose of the 
search is to locate information-rich sections of the 
text. Further examination would determine whether 
the record is of interest or not. 

The decision to utilize a trimming algorithm in the 
pliot program was based on the feeling that the error 
of failing to retrieve information was less tolerable 
than retrieving some irrelevant information. 
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APPENDIX D 
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Establishment and maintenance of a 
storage hierarchy for an on-line data 
base under TSS / 360 

by JAMES P. CONSIDINE and ALLAN H. WEIS 
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Yorktown Heights, New York 


INTRODUCTION 

As on-line interactive systems increase in popularity, 
several problem areas become more and more apparent. 
One of these is the management of the on-line accessible 
data base. It has been the experience of installations 
throughout the country that such a data base tends, if 
ungoverned, to increase in size as the system continues 
in operation, bounded only by the size of the storage 
available to- contain it. It is, therefore, essential for 
the continuance of a viable system that this data base 
be examined and methods devised to control its growth. 

In the first section of this paper we record some 
observations we have made on the nature of one par- 
ticular on-line data base, specifically its growth and 
usage characteristics. The second section details a 
system we have designed to control the growth of the 
data base and insure maximum utilization of the on-line 
devices available. The third section describes the 
results of operating with the system. The fourth section 
details future amplifications and modifications to over- 
come some foreseeable difficulties in the present version. 
Finally we summarize our observations and re-state 
the conclusions we have reached. 

TSS/860 data base at T. J. Watson Research Center 

Since our system first went on a somewhat regular 
schedule of four-hour-a-day user sessions in June 1968, 
it was clear that, even under these conditions of rela- 
tively low availability, managing the on-line storage 
was going to be one of our primary problems. The 


amount of on-line storage occupied by user data sets at 
that time was approximately 20,000 pages, or 80,000,000 
characters (1 page = 4096 characters or 8192 hexa- 
decimal digits). It was a matter of a few months before 
the amount rose to what is our working optimum, 
30,000 pages or 120,000,000 characters. This optimum 
is dictated by the maximum number of devices we 
wish to devote to on-line storage. The distinction 
between devices and volumes should be made clear. A 
volume is a unit on which data are actually recorded. 
There are in principle large numbers of volumes availa- 
ble. A device is a unit on which a volume is mounted 
and which carries out the transmission of data to and 
from the volume. Devices are necessarily limited in 
number. A tape reel is a volume; the tape drive is a 
device. 

To return to the data base, observations made at 
the time indicated that perhaps 10-20 percent of this 
data was non-useful. Examples of this are data sets 
defined but not used and never erased, output listings 
of assemblies and compilations done many days previ- 
ous to the current date and other such system- and 
user-generated residues. Measures were devised to 
periodically and systematically remove such unwanted 
data from the on-line storage, thereby achieving a 
small amount of leeway while the problem was being 
further studied. 

In an effort to acquire information on the usage of 
the data base, we implemented a means of marking a 
data set with the date on which it was used. Report 
programs were written to process the data thus re- 
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corded and the results were very informative to manage- 
ment and system programmers alike. 

Extracts from a typical report are presented in 
Figure 1. Among the facts which can be determined 
from such reports are the names of the authorized users 
actually using the system currently, how much storage 
each user is occupying, how much he is using, and how 
the amount of storage used by ekch user varies from 
observation period to observation period. The total 
amount of on-line storage that is being currently used 
by all users is also recorded. Ini addition, the data 
recorded can be processed to yield an on-line storage 
profile, as shown in Figure 2. 

For instance, in the reports formulated from data 
gathered on February 1, 1969 we discovered that of 
our 160 or so authorized users, some 50 had actually 
used the system since the beginning of the year. We 
also found that most of these 50 were not actually 
using all of the storage they were occupying. In one 
case, up to 95 percent of the storage of a particular 
user had not been used during the period. In total we 
discovered that of some 28,000 pages of storage on 
the system only 13,000 pages had been used in the 
last month. These figures were based on information 
recorded after all the waste space occupied by obviously 


Date data recorded - 4/4/G9 
Date for comparison = 3/2/619 


Total p ap;es Used Since 3/2/69 = 14441 

Total Pares In System on 4/4/69 = 27662 


USERID 

PAGES USED 

TOTAL PAGES 

USER 01 

1588 

1751 

USER 02 

11 

18 

USER 03 

0 I 

7 

USER 04 

263 

431 





Figure 1 — On-line storage reports 



N » no. of poges of on - line storage owned by user 


NOTE* The 4 users who own more than 1200pages eoch own about 40% of the 
available on-line storage 


Figure 2a — On-line storage ownership profile 



Qi omt of on-line tforoqa ueed by ueer bet. 3/2 - 4/4/6 9 
omt. of on-line storage ollocafed to user on 4/4/69 

NOTE* The number at the top of eoch column <• the fraction of on-line storage owned 
by uteri In that category 

Figure 2b — On-line storage usage profile 

useless data had been reclaimed. Similar data are being 
recorded periodically to monitor in a limited way the 
interactions of the users with the system. The amount 
of available on-line storage is recorded every time the 
system is loaded into the machine, a process which 
takes place three or four times in a fourteen hour day. 
The usage characteristics are recorded much less 
frequently, perhaps once or twice a month. Thus fan 
observations on this somewhat expanded time scale 
have been more than sufficient to give evidence of 
imminent difficulties in the matter of on-line storage. 

Even though these measurements were made under 
conditions of limited availability, they gave clear indi- 
cation of the existing problems involving the manage- 
ment of the on-line data base and the control of its 
size. We realized at an early stage that unless some 
steps were taken to reduce the amount of data main- 
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tained on-line, it would be impossible to operate the 
system in our user environment with the on-line storage 
capacity then available. As indicated earlier, the 
problem is by no means unique to our installation. 
Various means have been adopted to handle the 
problem of controlling the size of on-line data bases. 
One installation requires that each user validate every 
file he wishes to retain once in every twenty-four hour 
period. Unvalidated files are erased. Another approach, 
similar in some respects to the one which we will 
describe, is the “Date Deletion” scheme which has 
been in effect for some time on the Compatible Time 
Sharing System at Massachusetts Institute of Tech- 
nology. 1 

Since we felt at that time that we did not want to 
place the primary burden of storage management on 
the user, we looked for some systematic way of re- 
stricting the amount of data stored on-line. We wanted 
to combine ease of operation with convenience for the 
users. It seemed clear that a potentially vast conden- 
sation of the on-line data base could be achieved by 
systematically moving unused data sets from on-line 
storage to demountable storage volumes. The under- 
lying assumption would be that the overhead involved 
in restoring data sets that might be required by the 
users would be small compared to the advantages to 
be gained by being able to reduce the amount of on- 
line storage required at any one time. There were no 
observations of actual data set usage available to 
verify such an assumption, or to support any alterna- 
tive, so we proceeded to implement a simple design 
to alleviate in part our pressing problem, and also at 
the same time to provide the experience necessary to 
evaluate the underlying assumptions. This “data mi- 
gration” scheme is described in the following section. 

Management of the on-line data base 

Because of the limited amount of on-line storage 
available, it appeared necessary to us to establish a 
hierarchy of storage volumes, ranging from high-speed 
permanently mounted direct access volumes to low- 
speed demountable magnetic tapes. The establishment 
of such a hierarchy immediately implies a mechanism 
for distributing data among the various classes of 
volumes according to some predefined or even dy- 
namically defined criteria. 

Initially, in TSS/360 three categories of storage 
volumes suggest themselves: first, on-line direct-access 
volumes; second, off-line direct-access volumes, which 
would require mounting to enable the retrieval of infor- 
mation from them; and third, tape volumes, which 
would require mounting and, of course, have a lower 


data transmission rate than the direct access volumes. 
The first category comprises what are described in the 
TSS/360 system literature 2 as public storage volumes. 
Categories two and three are handled by TSS as pri- 
vate storage volumes. In our discussions below, the 
term “archival” storage will be used to refer to storage 
volumes of categories two and three which are pro- 
cessed by the migration scheme. As far as the rest of 
TSS/360 is concerned, these volumes constitute a sub- 
set of the general class “private storage volumes”. 

The criteria to be used to govern the arrangement 
of data among the categories of volume are obviously 
the subject of wide differences of opinion. We have 
been limited in our considerations of this topic by the 
information that can be collected on our system about 
the usage of individual data sets. We have chosen to 
base ourc riterion on the information mentioned in the 
first section, i.e., the date on which the data set was last 
used. Specifically, a data set is useful or not depending 
only on the length of time since its last use. This is 
admittedly a very simple basis for judgment but for 
the moment it is what is available. Alternatives will 
be discussed briefly in the fourth section. The scheme 
has been designed to enable easy inclusion of other 
migration criteria as they are deemed necessary and 
the required information becomes available. It has 
been implemented in the form of seven commands and 
an auxiliary data set, which records the status of the 
data sets moved to archival storage. The commands 
are RMPS, MPS, EMDS, LMDS, RMDS, SAVE, and 
CMS. The data set is called SYSMDS. A brief de- 
scription of each of these commands and the data set 
follows: 

RMPS — Recreate and Migrate Public Storage 

This command and the one that follows, MPS, are 
modifications of the TSS/360 system command, RPS 
(Recreate Public Storage). 3 The RPS command is used 
to copy the contents of current public storage, one 
volume at a time, onto a new set of public volumes, 
leaving behind in the process useless data sets and 
producing a new system with cleaner public storage. 
The RMPS command adds the criterion of currency to 
the criteria of usefulness already in the RPS command. 
If a data set fails this test of usefulness, instead of 
being copied onto the new public storage it is copied 
or “migrated” onto an archival volume and cataloged. 
In addition, relevant data regarding this “migration” 
are recorded in a special data set called SYSMDS. 
The format of this data set will be discussed later. 

The fact that the data sets which have been moved 
to archival storage are cataloged requires some eluci- 
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so as to prevent the duplication of data set names on 
public storage and on archival storage. Only one entry 
for a given data set name may appear in the catalog 
for each user. Private volume handling, however, is a 
sensitive area of TSS/360. If a user requests that a 
private volume be mounted and h\s request is granted, 
the device on which the volume jis mounted remains 
assigned to him until he specifically releases it or termi- 
nates his session. Thus, if the user were allowed to 
directly access an archival volume simply by requesting 
one of his migrated data sets frpm the catalog, this 
volume could well remain mounted on the device for 
several hours. This would render it almost indistinguish- 
able from a public volume, and defeat the purpose of 
the migration scheme. 

We have avoided this by specifying the first three 
characters of the volume identification of all our archi- 
val storage volumes as ‘SAV\ A; minor modification 
to the system prohibits the user frpm directly accessing 
any volume whose identification! begins with these 
three letters. Commands described below perform any 
service he may require which involves these volumes 
and always release the volumes, thus freeing the device, 
as soon as the service has been performed. This assures 
that devices will be in use as little as possible for pur- 
poses dealing with the handling of archival storage 

SYSMDS — Migrated Data Set Record 

It is appropriate at this point to discuss in some 
detail the SYSMDS data set. It is an indexed se- 
quential data set with an entry for each migrated data 
set with the data set name as key .These entries have 
the format illustrated in Figure 3. As well as being a 
record of the migration, the information stored in 
SYSMDS is also sufficient to recreate the catalog 
entries for the data sets moved to archival storage. In 
addition, there is an entry for each of the archival 
storage volumes. Included in these entries are the 
amount of available space on each volume, the number 
of pages to be erased, the number of erase pending 
data sets, and the total number of data sets on the 
volume. These entries have as key! the nine characters 
‘ZZZZZZZZ.’ followed by the six character volume 
identification. There is also a record containing the 
total number of archival pages erased and the total 
number restored to on-line storage. The information 
is all in EBCDIC characters so as to make it available 
by simply printing SYSMDS. An up-to-date copy of 
the SYSMDS data set is made after each modification 
and stored on the system residence volume (to insure 


LOCATION CONTENTS 

0-44 Data Set Name 

49-50 Data Set Organization 

(Sequential , Part l tloned,etc. ) 

53-56 .Number of Pages(S!ze of the Data Set) 

59-64 Date Created -'DDD/YY' 

(010/69 Indicates the tenth day of 196 c >) 
67-72 Date Last Used 

75-80 Date Migrated 

83-88 Archival Volume Identification 

91-94 Archival Volume Type 

97 -gg File Sequence Number(for tape volumes only) 

102 'Erase Pending' 

(has the value ' Y * for Yes and ' N 1 for No) 


NOTE : 

For a full discussion of TSS/360 terminology please consul 
Reference 2. 

Figure 3 — Format of the data set entry in SYSMDS 

continuity between successive versions of public 
storage). 

MPS— Migrate Public Storage 

This command differs from RMPS in that when it 
operates on current public storage it moves only those 
data sets which fail the test of currency. They are 
moved to the appropriate archival storage volume and 
the copies in public storage are erased. Appropriate 
entries are made in the SYSMDS data set. This com- 
mand can also be applied to archival direct access 
storage volumes, producing an additional level of 
storage on tape, creating the three-level structure 
described earlier. Again, the entries in SYSMDS are 
amended to reflect the changes brought about by the 
execution of the command. 

The two commands RMPS and MPS are the primary 
means by which out-of-date files are moved ?rom no- 
line to archival storage. The next group of commands 
is concerned with enabling the user to examine the 
contents of archival storage and modify the number 
and status of his files which are stored there. 

LMDS — List Migrated Data Sets 

This command enables the user to determine which 
of his data sets have been moved to archival storage,, 
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In addition to the name of each data set, information 
such as its organization, size, date last used, date 
migrated, etc., is provided. 

EMDS — Erase Migrated Data Set 

This command enables the user to specify that a 
data set of his which is on archival storage is to be 
erased. This command simply marks the appropriate 
entry in the SYSMDS data set as “erase pending’ ’ for 
subsequent processing by the CMS command (q.v.). 
The data set name is not removed from the catalog 
until the actual erasure on the volume has been carried 
out by the CMS command. The user may specify that 
either a specific data set or all his data sets on archival 
storage are to be erased. 

HMDS— Restore Migrated Data Set 

This command enables the user to bring about the 
return of a data set from archival storage to on-line 
public storage. The process occurs while the user waits. 
The data set is copied from the appropriate archival 
storage volume onto on-line public storage and the 
copy on archival storage is erased. Appropriate entries 
in the SYSMDS data set are amended to reflect the 
results of this operation. The archival storage volume 
is then released, making the device again available 
for allocation. 

SAVE — Put A Copy Onto Secondary Storage 

This command enables the user to specify a data 
set as one to be migrated at the next execution of the 
RMPS or MPS command. 

The maintenance of archival storage is carried out 
by the use of two commands. The first, MPS, discussed 
above, can be applied to the demountable direct-access 
‘SAV ? volumes to produce a second level of archival 
storage consisting of data sets whose last use is more 
remote in time than those on the first, direct-access 
level. These would generally be stored on tape. The 
second maintenance command is the CMS command 
which will now be described. 

CMS — Clean Migrated Storage 

This command examines data set entries in the 
SYSMDS data set for the “erase pending” flag set by 
EMDS to indicate that the corresponding data set is 
to be erased. The data sets are erased if they are on 
direct access volumes. In any case, the entries for the 
data sets in SYSMDS are deleted and the appropriate 
volume entries are amended to reflect the results of 


these transactions. If the number of valid data sets 
on a tape volume becomes zero, the tape is released or 
made available for further use for migration. 

Results observed after migration 

The first migration was carried out on March 10, 1969 
in the process of converting our system from Version 
2.0 to Version 4.0 of TSS. The criterion used was that 
a data set should have been used since January 1, 
1969 to remain in on-line public storage. Operating 
problems prevented the processing of two of our six 
public volumes at that time. In the ensuing month an 
additional 3,000 pages were moved to archival storage. 
It should be pointed out that if the amount of data 
which was moved to archival storage had been returned 
completely to the current on-line public storage, we 
would not have had enough devices available to con- 
tain it. Thus the project did not simply justify itself; 
it proved essential to the continued life of the system. 

Since that time the process has been carried out at 
approximately one-month intervals. The status of on- 
line storage as of July 1, 1969 is reflected in Figure 4 
which is presented for comparison with Figure 2. It 
can be seen from Figure 4 that the overall character- 
istics of the on-line data base have not changed a great 
deal in the intervening three months. There are about 
twenty more users owning data sets on-line than there 
were in April, but the ownership profile remains almost 
exactly the same. Figure 4b reveals a noticeable in- 
crease in the degree of utilization of on-line storage. 
This is indicated on the whole by the increase in the 
value of Q, the utilization quotient, calculated for all 
users, and in detail, by the shift toward higher values 
of Q, especially visible between Q = 0.7 and Q = 1.0. 
Figure 5 contains similar information for the total 
storage on the system, i.e., on-line storage plus archival 
storage. This total storage is what would have to be 
stored on-line in the absence of migration, assuming 
there were enough devices to do so. The total storage, 
thus defined, is about 51,000 pages, of which some 32,000 
are on-line and about 19,000 are archival. One can 
observe that the shape of the total storage ownership 
profile (Figure 5a) is very similar to that of the on- 
line storage profile. Figure 6 gives an idea of how the 
amount of storage occupied is divided between archi- 
val and on-line. Looking at this figure, one should 
be aware that there are thirty-six users who have 
no on-line storage, and thus cannot be classified 
as active. They are taking advantage, consciously or 
unconsciously, of the archival property of the mi- 
gration volumes and leaving all their data stored in 
this fashion It should be pointed out that we have 
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Figure 4a — On-line storage ownership profile on 
July 1, 1969 



NOTE: THE NUMBER AT THE TOP OF EACH COLUMN IS THE 
FRACTION OF ON-LINE STORAGE OWNED BY USERS 
IN THAT CATEGORY 

Figure 4b — On-line storage Usage profile on 
July 1, 1969 

made no effort to encourage our users to police them- 
selves in their use of on-line storage. Thus these figures 
must not be considered as reflecting what storage 
space the users need, but rather what they will occupy 
and use if they find it available. An accounting pro- 
cedure is being instituted which m;ay result in reductions 
by the users of the amount of on-line storage they 
occupy. This approach has been used with success in 
other applications, e.g., at Stanford University . 4 



N* NO. OF PAGES OF TOTAL STORAGE (ON-LINE* ARCHIVAL) 
OWNED BY USER 


Figure 5a — Total storage ownership profile on 
July 1, 1969 



NOTE: THE NUMBER AT THE TOP OF EACH COLUMN REPRE- 
SENTS THE FRACTION OF TOTAL STORAGE OWNED 
BY USERS IN THAT CATEGORY 


Figure 5b — Total storage usage profile on 
July 1, 1969 
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_ AMT. OF ARCHIVAL STORAGE OWNED BY USER ON 7/I/S9 
H m AMT. OF TOTAL (ON-LINE + ARCHIVAL) STORAGE OWNED BY 
USER ON 7/1/69 

Figure 6 — Archival storage/total storage ratio 
Distribution 


PAGES ON-LINE PAGES MIGRATED PAGES RESTORED PAGES ERASED 

32,000 19,400 000 1100 

Figure 7 — Status of storage as of July 1, 1969 

The status of migrated storage as of July 1, 1969 is 
presented in Figure 7. The small fraction of the mi- 
grated storage that has been restored to on-line storage 
is a favorable sign for the continued success of the 
approach. 

Not at all surprisingly, in our experience with the 
operation of the migration scheme, several drawbacks 
have become apparent. For instance, one of the more 
valuable features of TSS/360 is that it allows users to 
share files with one another. This is made possible by 
links established in the system catalog between the 
directories of the individual users sharing the files. 
Under the present migration scheme, it is not possible 
for these links to survive the migration or restoration 
process. Thus after a migrated data set has been re- 
stored to on-line storage, the users sharing it have to 
re-establish the linkages which make the sharing 
possible. Another shortcoming from the user's point of 
view is that he is made aware of the existence of mi- 
gration whenever he attempts to re-activate a file that 
has not been used recently. A separate action is required 
to make his file available to him once more. Also the 
criterion for migration is too simple to satisfy either 
the system manager or the user. For the manager, it 
is too easily circumvented, while for the user, it does 


not sufficiently distinguish between the user who 
occupies a large amount of on-line storage and the 
user who has a much smaller amount allotted to him. 
When migration takes place either one may find that 
his data sets have been migrated, and in fact the 
smaller user may find that more of his data have been 
moved to archival storage than the large user's. 

Amplifications and extensions-the evolution of migration 

There are several areas in which improvements are 
projected. These might be stated as goals in the imple- 
mentation of a good migration scheme. 

a. Migration should be transparent to the user 
except for the wait involved while a data set is 
restored to on-line storage. No action of the 
user other than his wish to use his data set 
should be required to activate the restoration 
process. 

b. There should be reasonable criteria for mi- 
gration and the information necessary to evalu- 
ate them should be available. 

c. There should be a migration ‘monitor' to de- 
termine the extent to which migration is neces- 
sary based on the condition of public storage, 
the amount of storage available, etc. In ad- 
dition, based on system load, the monitor would 
schedule the migration process so as to have a 
minimum impact on system performance. 

We are attempting to address a. and c. in a unified 
way. The first step is, of course, to allow the migration 
routines to be invoked by other programs as well as 
by commands from the terminal. Then the transparency 
problem can be handled by having the routines which 
supervise the user's access to his on-line data sets 
recognize that a data set has been migrated and initi- 
ate the process of restoration of the data set to on- 
line storage. The next step will be having the migration 
to archival storage activated by a routine which from 
time to time monitors the state of on-line storage and 
determines when more on-line space is required. Thus 
the necessity of programmer or operator intervention 
to initiate the migration process will be eliminated. 

In a parallel effort additional information ou the 
usage of data sets and on-line storage will be acpumu- 
lated. As a simple example, we intend to add to the 
‘date last used' which we now record on the data set, 
information about the frequency of use of the data set. 
We hope then to be able to form reasonable judgments 
about which data sets to select for migration on the 
basis of this additional information. We also expect to 
take advantage of accounting routines to acquire infor- 
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mation about the users and their use of the system. 
By accumulating as much information as possible we 
will be able to formulate more and more reliable criteria 
for the usefulness and currency of on-line data sets. 

SUMMARY 

In summary, we have seen that the size of the TSS/360 
on-line data base increases rapidly with use of the 
system. Since a limited amount of on-line storage is 
available, it is necessary to control this growth. Ob- 
serving that at any time much on-line information is 
not being used, we have formulated a systematic 
method of allocating data sets to on-line or archival 
storage based on some criteria of usefulness. The 
elementary scheme put into operation at our instal- 
lation has proven of great value in containing the on- 
line data base while giving the users an environment 
in which to expand their applications and use of the 
system. 

We have come to several conclusions regarding the 
maintenance of our on-line data base which we re- 
state here. 

1. Some means of controlling the size of the on- 
line data base is absolutely essential for the 
continued operation of the system in our en- 
vironment with our limited amount of on-line 
storage. 

2. On the basis of our experience thus far, it is 
sufficient to examine the usage of data sets on 
a weekly basis or even less frequently to keep 
our on-line data base of manageable size. We 
do our cleaning up operations at approximately 
two-week intervals, with migration being car- 
ried out when necessary to reduce the size of 
the on-line data base to tljie desired value. 

3. It appears that the amount of space gained by 
moving less used data sets to archival storage 
more than repays the effort involved. Most of 
the data moved to archival storage have stayed 
there. This is in part am indication that the 


criterion we have used for migration is a reason- 
able one, at least for our installation. 

We intend to expand this scheme to make it as 
unobtrusive as possible while still continuing its work 
of limiting the size of our on-line data base. In addition 
we will continue accumulating information on the 
characteristics of our users and their interactions with 
the system so as to formulate the most significant 
criteria possible for migration. 
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INTRODUCTION 

In the past quarter century, from MARK 1 (1944) 
ENIAC (1946) to IBM-360/195 and CDC-7600, the 
information processing community has progressed in 
diametrically opposite directions. On the one hand, 
the hardware 1 and software 2 development has been 
toward a general purpose computer system. On the 
other hand, the computer users often dedicate a general 
purpose computer for a special application where only 
parts of the computer system resources are used. 

There are rarely any special computer applications 
which will utilize a general purpose computer to its 
full capacity in a balanced fashion. A corporate in- 
formation system with its comprehensive application 
spectrum 3 will exploit the full potential of a general 
purpose computer system. 

To assure a successful marriage between the com- 
prehensive corporate information system and a general 
purpose computer system one must resolve the probelm 
of how to direct the different application subsystems 
to the desirable computer resources. After this problem 
is resolved, all the application subsystems within the 
corporate information system will operate in a homo- 
geneous environment to obtain the optimum efficiency 
of the system. 

The rest of the paper will describe a resource man- 
agement subsystem designed and implemented with 
the above objective in mind. 

System requirements 

The following requirements are essential for an ef- 


fective resources management subsystem for a large 
corporate information system. 

1. Provide for the Orderly Execution of Programs 

All applications and systems programs must 
function in harmony within a large corporate 
information system to ensure reliable and ef- 
ficient operation. For instance: 

. Coordinating the execution of various 
asynchronous subsystems : 

. File/data management subsystem 
. Teleprocessing subsystem 
• Systems service facilities 
. Various applications subsystems 

.Prevent contention of usage of hardware 
and software facilities to provide optimal 
use of these resources. 

2. Support of a Variety of Applications Subsystems 

A comprehensive corporate information system 
must contain many operationally independent 
yet logically interrelated application subsystems. 
The resources management subsystem should 
be able to support applications in a manner 
best suited to the needs for each individual 
project without placing undue restrictions on 
the others. 

Following are a few representative types of 
applications. 
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• Real-time, high volume, random arrival 
transactions require large numbers of pro- 
grams to process them. For example, a 
simple inquiry response application which 
services the general public. 

. Real-time, low volume random arrival 
transactions require a small number of 
rather complex programs to handle them. 
For example, on-line computation and up- 
date of the data base requires more proc- 
essing and safeguard! considerations than 
does the inquiry type 0f transaction. 

-Batch, extra high vblume sequential in- 
puts require extraordinary complex proc- 
essing logic. For example, maintaining a 
master file with several million large records 
involves the insertion, deletion, and up- 
date of files as well as the detail analysis 
of the data. It also includes the generation 
of scores of reports aind intermediate files 
as input to other subsystems. 

-Batch, medium or low volume processing 
uses time-consuming hmltiple file searching 
strategy to produce! summary reports. 
For example, as the] result of exception 
conditions many interrelated data are 
analyzed, summarized, and reported to 
aid management’s decision making. 

3. Operate in a Multi-Level , Multi-Programming 
Mode 

Because of high volume, real-time applications 
which process a large number of messages, con- 
ventional multi-programniing techniques can- 
not keep abreast of the traffic. Therefore, it is 
necessary that the systepi be operated in a 
multi-level multi-programfiiing, or subtasking 
environment within a single region of main 
memory. 

For flexibility in operation, the system should 
also have the following capabilities : 

. Ability to suspend a subtask in one region 
without endangering the operation of the 
other subtasks in the same region. 

-To transfer a subtask out of a region in 
order to be able to schedule a different 
subtask in its place. 

-To change priority dynamically to provide 
the best system throughput. 


4 . S implicity of Operation 

When operating a real time system with a large 
network of remote terminals, events occur far 
more rapidly than the best computer operator 
can respond to them. Therefore, the resources 
management subsystem should have facilities 
to perform the following functions : 

• Minimize the job control data required. 

• Minimize operator decision and interven- 
tion. 

5. Test Mode of Operation 

In order to properly test programs to be placed 
in such a complex environment, a test mode of 
operation should be provided to perform the 
following functions : 

• Allow program testing in an operational 
environment. 

• Assure data security for those files which 
are accessed from or directed to the cor- 
porate database. 

• Assure that no actual updating of any of 
the corporate data base files occurs either 
purposely or accidentally . 

6 . Statistic A cquisition and Reporting 

A continuous evaluation of the system should 
be maintained on a day-to-day basis. Statistics 
can be gathered during the processing day and 
reported at the end of the day. This permits 
responsible personnel to evaluate the system 

7 . Open-Ended Design 

The resources management control program 
should have an open-ended design for ease of 
system expansion and modification. This will 
reduce the impact of changes to the control 
program on any existing subsystem or applica- 
tions program. 

System structure 

Figure 1 illustrates the hierarchial structure on 
three levels of multi-programming. 

The first level utilizes the multi-programming 
capabilities of software supplied by the computer 
manufacturer. We call it the “host operating system.” 
At this level the resources management subsystems 
will execute concurrently with other processors, such 
as COBOL, FORTRAN, or some other major appli- 
cation subsystems. The second level of multi -pro- 
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gramming is accomplished by the regional supervisors 
(resources management subsystems) or the so-called 
“subtasking facility. ,, This means that many inde- 
pendent programs (subtasks) can be run concurrently 
within a region or partition of the host operating 
system. The third level of multi-programming is 
achieved by means of reentrant module (or pure code) . 
If a module will not self-modify any part of its code 
during processing, the module is capable of parallel 
processing of multiple numbers of transactions . 

System logic flow 

As an overview of the P. G. and E.’s corporate in- 
formation system refer to Figure 2. You will notice 
that the manufacturer’s operating system facility 
serves as the host to control the overall operation. 
On the left of the figure is the front end of the tele- 
processing subsystem performing such functions as 
polling, addressing of the terminal network, message 
queuing and dispatching, terminal hardware error and 
security checking, etc. The next major part of the 
TP subsystem is the input message editing (TPCHUG) 
and output message editing package. This package 
makes all the message processing programs indepen- 
dent of terminal hardware. At the base of the figure, 
the file management subsystem 4 centralizes control 
of all I/O functions ; it exchanges information from the 
information system data base and therefore, makes 
the data independent of applications programs (or 
message processing programs). Beneath the operating 
system we find the resources management subsystem 
functioning as the regional supervisor for all the sub- 
tasks within the on-line region and, therefore, we can 
support different application subsystems in the same 
region. We can also open another region for on-line 
message processing and share the file management, 
teleprocessing and resources management subsystems. 


U) 

REGION 

(PARTITION) 


01 ) 

SUBTASK 

(III) 

PROCESS 
QUEUE 

Figure 1 — Hierarchical control on three levels of multi- 
programming 
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Figure 2 — Host operating system 

On the right side of the figure, we can open a number 
of background regions for computing or batch type 
jobs, or we can have batch executive subsystem (BESS) 
control the subtasking in the batch region. The rest 
of this section will briefly describe how a typical on- 
line region or a batch region works under the resources 
management subsystem. 

The on-line system 

The Controller of On-Line Processing (COP) is 
designed to control the execution of the on-line portion 
of the system. Figure 3 graphically depicts how the 
computer’s main memories are allocated during the 
on-line processing day. 

• Initialization of the On-Line System 

At the start of each on-line day, two subsystems 
must be initialized. COP coordinates the actions 
of these two subsystems in preparation for the 
on-line execution of message processing programs. 

.At first, the teleprocessing subsystem must 
open the lines to the remote terminals and 
initialize the queues where the messages will 
await their turn to be transmitted and processed. 

• In the meantime, the initialization routines of 
COP are busy establishing the necessary con- 
trol blocks and data pools (see Appendix I). 

• A program control block will be built for each 
message processing program. A master control 
block will be built for each transaction response 
pool (TRP) . The number of TRPs to be used 
is specified by a parameter that is brought in 
with other start-up control data such as the 
number of event control blocks (Figure 4). 

.Subsequently, COP will cause the file manage- 
ment subsystem to build file/data control 
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Figure 4 — Control blocks for on-line processing 


blocks, record descriptors, and data buffer pools. 
When all the on-line files are opened and ready 
for processing, the file nianagement subsystem 
will notify COP, and CbP in turn will wait 
until the teleprocessing subsystem has com* 
pleted its initialization processes. 

.As soon as the teleprocessing routines have 
signaled their start of processing, COP repeats 
a sequence of actions to prepare message proc- 
essing subtasks for mulththread on-line opera- 
tion. COP does this by establishing one sub- 
task for each TRP and by signaling the operating 
system to start the teleprocessing get-a-message 
program (TPCHUG). 

.Figure 5 shows a path from COP to TPCHUG, 
but this path is not taken until all subtasks 
have been established. For example, assume 
that five streams of messages will be processed 
concurrently. COP will signal the operating 
system five times that a subtask is to be es- 
tablished starting with TPCHUG, and before 


releasing control to any of these subtasks, COP 
will again signal the operating system that 
five more subtasks are to be established starting 
with Subtask Linkage (SLINK). All ten of 
these subtasks are conditionally executable 
based on ten event control blocks which will 
be marked by COP. COP marks the five 
event control blocks for TPCHUG to show that 
the five associated TRPs are free, ready to ac- 
cept input messages. SLINK’s event control 
blocks are marked so that SLINK will wait 
until a message in a TRP is ready to be proc- 
essed. 

.COP will now wait for any one of five event 
control blocks to be marked by TPCHUG. 
It is while COP is waiting that Path 1 is taken 
and TPCHUG gets control for the first time; 
and the system is now active. 

. Active On-Line System 

TPCHUG makes use of the system support rou- 
tine FINDREC to locate the place in the TRP 
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Figure 5 — ‘Flow of program control during on-line 
processing 


where the incoming message and its internal for- 
mat are to be placed. TPCHUG will do the fol- 
lowing things: 

.Get an incoming message from the waiting 
queue if one is available. If it is not, it will 
wait until there is one so that other subtasks 
may be executed. 

. Place the message in the TRP. 

• Edit and translate the message into the in- 
ternal format. 

• Mark OOP's event control block for this TRP 
to show that a message is available for proc- 
essing. 

• Wait until TPCHUG's event control block 
is again posted by COP to show the TRP is 
again ready to accept a message. 

When TPCHUG waits, Path 2 back to COP is 
effectively taken. COP will then check its event 
control blocks to determine which TRP requires 
service. The following actions are taken by COP: 


.Via a program control list constructed by TP- 
CHUG, COP will determine the next module 
to be applied to the current message in a TRP. 

• Mark SLINK's event control block for this 
TRP to show that action is to be taken. 

.Wait for the event control blocks to be marked 
by TPCHUG or SLINK. 

Path 3 is now completed and SLINK will gain 
control when this subtask is made active. At this 
time the following functions will be performed: 

. The required applications program will be 
loaded, if it is not already in core memory. 

.Control will be given to the appropriate pro- 
gram so that it may execute. 

-.Upon return from the program, SLINK will 
mark COP's event control block to show that 
the program has completed its processing. 

• Wait upon its event control block for this TRP. 

Paths 4, 5, and 6 have been taken, and the same 
sort of thijig occurs for Paths 7, 8, 9 and 10. 
The mrin difference is that when TPMSGOUT 
has finished putting the response (s) on the out- 
put waiting queue, OOP's event control block 
is complete and the TRP is now free to be used 
again. 

.Termination of the On-Line Day 

Messages may be in waiting queues or in various 
stages of processing when termination of the on- 
line system occurs. COP must assure that the 
teleprocessing subsystem has received all incoming 
messages, the input waiting queues have all been 
emptied and all messages have completed proc- 
essing before the file management routines close 
the files and COP releases the subtasks. While 
the teleprocessing programs are emptying the 
output waiting queues of messages and trans- 
mitting them, COP is editing the statistics which 
have been gathered that day and producing a 
report from them. After all processing has been 
completed, control is returned to the operating 
system. 

Batch system 

. Initialization 

Control information must be gathered and set 
up in main storage to effect the proper sequence 
of jobs to be run during a particular batch job 
stream. The aforementioned control information 
will contain such things as: 
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Figure 6 — BESS program control blocks 

• Program Name. 

• Resultant condition code, if job fails to run. 

• Condition code or codes resulting from other 
jobs which this job depends on in order to know 
whether or not to execute. 

• Program level denoting whether or not this 
job can run concurrently with other jobs 
based on core usage, sequence of jobs to run and 
shared use of data files. 

• Names and device locations of conditional 
files; i.e., files which might or might not be 
present during any one job within a job stream. 

• Processing 

During the so-called batch processing, depending 
on the information supplied for control during 
initialization time, jobs will be scheduled either 
alone or as subtasks, depending on core require- 
ments, availability of data files, and shared access 
of data files. 

• Termination 

Upon completion of all the jobs which could be 
processed, the operator or other appropriate per- 
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Figure 7 — BESS status report 

sonnel will be notified as to the status of all the 
jobs. Such information as: 

• Jobs which were not run due to data not being 
available. 

• Jobs which were not run to completion due to a 
program failure. 

• Jobs which were run to completion. 

• Other information which will help key person- 
nel to effectively evaluate their choice off action. 

System facilities 

This section will describe some system support 
facilities implemented to enhance the reliability of 
the system. 

On-line system’s facilities 

. Program Test Mode of Operation 

Any new or updated version of a message process- 
ing program which is to be added to the live 
system is first run in the test mode even though 
the program has gone through unit system and 
high level testing. 

If it is an updated version of an operation pro- 
gram, it will have the last copy available *is back- 
up on the library. If it is to update files it will 
not be permitted to do so directly, but updated 
records will be saved, verified, and then added 
to the files in batch runs. Should the program fail 
during the test the old reliable copy will be 
brought in and normal processing will continue. 

With new programs, no backup copy is available 
so if it fails, it is dropped from the system. A 
new program also affects the rest of the system 
in that while it is executing, any production 
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programs that might fail will be refreshed and 
information processed by the new program will 
also be included in the output. 

• Trouble Procedures 

Trouble procedures and error defaults have been 
developed. 

• Excess Elapsed Time 

Should a program be in memory top long 
(elapsed time) to process a message, its priority 
will be raised so that it can get a greater share 
of the CPU's time. 

• Excess Processing Time 

Before control is turned over to a processing 
program, a timer is set by SLINK to the num- 
ber of machine cycles to be allocated to this 
program for its processing. The number of 
machine cycles allocated is based upon operating 
history plus a safety factor. If a program should 
exceed the number of machine cycles allocated 
to it, an interrupt will occur and control will 
be returned to COP. 

When a program is terminated due to exce^ive 
operating time or has abnormally ended, a snapshot 
of the TRP and all associated control blocks is taken. 
An error message is returned to the terminal from 
where the transaction was entered and the control 
block for the program is marked to show that an error 
has occurred. If the program is used by more than one 
type of transaction, or is critical to system operation, 
a fresh copy will be brought into memory. If the pro- 
gram is not normally resident, a fresh copy will be 
brought in before it is used again. 

If a program that exceeds the processing time is run- 
ning in the test mode, it will not be used again, but a 
backup version will be used instead. If two successive 
errors occur, the program will be marked unavailable 
and the type of transaction that precipitated the 
error will be rejected until manual action is taken to 
reinstate them. 

• Accumulation of Statistics 

During the operation of the on-line system, COP 
is gathering statistics about the various programs. 
These statistics are recorded in the appropriate 
program control block and a report produced at 
the end of the day (Figure 8). COP is also recording 
each TRP so that they may be analyzed after the 
day's run. 


Batch Executive Sub-System (BESS) 

• Improved Throughput 

Some operational batch jobs that ran under 360- 
OS-MVT (Multiprogramming with Variable num- 
ber of Tasks) were compared to multi-tasking under 
BESS and MVT. They showed a 60 percent 
savings of time under BESS. 

This can be attributed to : 

.Subtasking 

By running two or more of the jobs concur- 
rently as subtasks to BESS within an MVT 
region, it then makes it possible to take the 
greatest advantage of the computer's resources. 

• Reduction of Job Control Information 

All the jobs are run as subtasks to BESS and 
not as individual jobs. This eliminates the need 
for a great deal of the job control language 
cards for each program. The control informa- 
tion required by BESS is minimal and takes 
less time and space to read, interpret, and use 
(Figure 6). Only one control record per program 
is required. 

• Sharing of Record Accesses 

Each time a record is read from a direct access 
or sequential storage device, it can be shared 
by two or more programs that are run as con- 
current subtasks to BESS. There are restric- 
tions such as only one program may update 
the file and all must look at the records in the 
same order or fashion. This savings of record 
access times is appreciable for large files in a 
corporate data bank. 

• Job Scheduling 

The programs to be run are dependent upon the 
data that was collected during the on-line day. 
BESS will determine which programs are to be 
run and in what order. 

• Conditional Runs 

Part of the control information supplied BESS 
is the input file (or filek) which must contain 
data if the program is to be executed. If data 
is present, BESS will schedule the program for 
execution. If the data is not present, BESS will 
not schedule the program. 

Another part of the control information is a 
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Figure 8 — OOP status report 


set of inhibit codes for the program. Each pro- 
gram that does not run correctly to completion 
will cause a condition code to be set and if the 
particular combinations of inhibit codes for a 
program are satisfied, it will not be run. This 
saves the time of loading and attempting to 
execute a program that should not be executed. 

. Maximun Use of Resources 

Each program’s control information contains 
a level code which shows two things. The first 
is the order of execution. A program with a 
level code of three cannot be run until all pro- 
grams with level codes ! less than three have 
been run or rescheduled; The second function 
of the level code is to designate groups of pro- 


grams which may be run concurrently without 
conflict. 

With this information, BESS can execute as 
many subtasks as the computer’s resources 
permit. 

. Other Features 

This approach to controlling batch runs has two 
other major assets. 

• Better Error Control 

If a' program abnormally terminates, BESS 
will regain control and can invoke error pro- 
cedures to salvage as much as possible from the 
run. By doing so, all other parallel jobs can con- 
tinue running without interruption and most of 
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the work can still be done while a fix is being im- 
plemented to correct the error. 

• Piggy Back Programs 

There are a variety of programs that are run 
only once or very rarely. These programs do 
not update the files but only look at the records 
to perform some analysis. These programs can 
ride piggy back on some other run in the system 
under BESS. The added time is negligible 
compared to the cost of passing the voluminous 
files used in a large corporate information system. 

• Reduction of Testing Time for .Application 
Subsystem 

An application subsystem within a corporate 
information system often contains a goodly 
number of interrelated program modules. 

During the system integration testing, if only 
one of the modules fails, it will cause the whole 
job to fail. Therefore, to test all the modules in 
a combined environment can be very time- 
consuming. 

However, under BESS each module can be 
classified as a subtask. If a subtask fails, the 
entire job will not terminate and BESS can 
proceed to test the remaining modules. Figure 
7 shows a report by BESS to facilitate the 
analysis of the testing. 

System Implementation 

The Resources Management Programs have been 
written in IBM-360 Operating System Assembly 
Language (ALC). 

1 . It is fully interfaced with the IBM-360 operating 
system MVT. 

• The applications programs and the systems 
programs operate as independent subtasks 
of the regional resource manager; abnormal 
termination of a subtask will not stop the 
remaining subtasks in the region. 

• The package is not tied to any particular 
release of O/S; hence, if a new version is 
released, there should be little effect on this 
package. 

2. The Resources Management packages take 
full advantage of existing operating system 
facilities and make extensive use of the sub- 
tasking and master scheduler facilities. 


3. It is intended to interface with all the operating 
system supported languages (COBOL and ALC 
interface have been implemented). 

4. The entire package has been designed to be 
dynamic in nature; that is, all programs are 
load modules. They are not linkage edited 
into the applications program; thus, the pack- 
age may be redesigned and improved without 
any appreciable effect on the applications pro- 
grams. 

5. The entire package has been programmed in 
re-entrant code. 

6. The hardware anticipated over the next several 
years includes 'two large central processors 
with a million bytes of main memory, supported 
by smaller satellite computers and a score of 
multi-drive disk storage units. The system is 
being designed to support several hundred 
terminals, most of which are expected to be high 
speed CRT display units. 

APPENDIX I 

• On-Line System Control Blocks 

The on-line system makes use of the control 
blocks established during initialization time. These 
blocks are: 

. Program Control Block (Figure 9) 

This control block contains the program identi- 
fication, the storage address within the library, 
operating system controls, on-line control data, 
and counters to record the number of times the 
program was used. It permits the on-line system 
to bring a needed program in from the library 
in an efficient manner. This block is also used by 
COP to collect statistics about the program and 
determine its current status. 

. The Master Control Block for the TRP (Figure 10) 

This block contains pointers to various other 
control blocks as well as containing the two evenf 
control blocks which apply to the subtasks as- 
sociated with the TRP. One pointer shows the 
location of the TRP, one points to the task con- 
trol block located in the operating system's 
region, and another to the event control block 
that will be marked to tell COP when a program 
has completed processing in this TRP .The next 
one can either be a pointer to the processing pro- 
gram or its identifier depending upon conditions 
at the time. Two event control blocks are next 
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/. Mere is one block for each program that may berunin 
the on-line system 

2 . These blocks are built eachdaya( Initialization time. 


9 . They may be built durinq the on-line day by operator 
direction or in Me emit &f system malfunction. 

Figure 9 — Program control blocks used by the 
controller of on-line processing (COP) 

and they are the ones used by TPCHUG and 
SLINK, respectively. The next pointer is to 
TPCHUG’s task control block and the last to a 
list of addresses which in turn point to the param- 
eters and records to be used or updated by the 
message processing program- 

^The Transaction/Response Pools (Figure 11) 

These pools of data and controls are the heart of 
the on-line system. Their number determines how 
many messages can be processed concurrently. 
The first part of the TRP! contains control infor- 
mation to help FINDREC in locating the various 
records and spaces within the pool. These con- 
trols also help COP to locate programs and other 
control blocks that are Required. Because COP 
is re-entrant, it cannot store information in an 
area reserved for itself, sd all controls for a given 
message must be maintained in the TRP apart 
from all others; therefore, no “cross talk” will 



l There is one control block for each WPinfbe -system. 

2 . These blocks are built at initialization time a/ong with 
the TRP's and are updated as needed during Me on-iine day 


Figure 10 — Master control blocks for 
transaction /response pools (TRP’s) 


occur between concurrently processing subtasks. 

The latter part of the TRP is used for the mes- 
sage, its response (s) and overflow control if more 
room should be needed. Currently, 4006 bytes 
of memory for a TRP have been adequate to 
handle 98 percent of the messages processed,, Ex- 
pansion is provided by going to an overflow area 
if more room is needed. Figure 12 shows the rela- 
tionship between the control blocks and TRPs. 

GLOSSARY 

BESS 

Batch Executive Sub-System 
BUMP 

Branch to Utility Modules and Programs 

This is a small module appended to each high-level 

language program to : 
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system 


.find the various system support modules within 
a dynamic environment. 

• establish the linkage between the program and 
the system support module. 

COP 

Controller of On-Line Processing 


DCM 

Data Control Manager 

A subsystem which controls the data base for the 
Management Information System. 

DCMGET 

An on-line DCM subroutine which retrieves selected 
information from the data base for applications 
programs. It makes the applications programs in- 
dependent of the data base structure. 

DCMPUT 

An on-line DCM subroutine which selectively up- 
dates the data base for applications programs. 

ECB 

Event Control Block 

A small block of memory that contains indicators 
to show if a program is waiting for an event to oc- 
cur, if the event has occurred and the completion 
code associated with the event, when posted. 

FINDREC 

A resources management on-line subroutine that 
does dynamic space allocation and locates data 
within the transaction/response pools. 

Multi-tasking 

The interleaved or time-shared execution of two 
or more program tasks within a single CPU; multi- 
programming. 

On-Line 

Pertaining to the responsiveness of a computer 
system. To respond in a timely fashion to user’s 
needs who have direct access to the computer via 
data entry devices, terminals and displays; real-time. 

SLINK 

Subtask Linkage 

An on-line resources management program that is a 
constant subtask to COP and links to each process- 
ing program. 

Subsystem 

A system of interrelated programs that is subordinate 
in control and execution to another system. 

Subtask 

An executable program that has all the attributes 
of a task but is subordinate to and under the control 
of another task. 








452 Fall Joint Computer Conference, 1969 


Task 

One of two or more programs, or series of programs 
which execute concurrently in a single CPU. 

TPCHUG 

A teleprocessing program that is a constant sub- 
task to COP. It reads transactions from the input 
waiting queue, edits them, and translates them into 
their processing format. 

TPMSCOUT 

A teleprocessing subroutine that converts a response 
to a terminal's format and places it into the output 
waiting queue. 

TRP 

Transaction Response Pool 

A block of memory which contains a single raw trans- 
action (message), some of its j control information, 
its intermediate forms and its: response (s). A TRP 
is assigned to one transaction at a time for its active 
life within the CPU. It contains all data associated 
with the transaction in chronological sequence so 
it is useful for debugging. 
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Incorporating complex data structures 
into a language for social science 
research 

by STEPHEN W. KIDD 

The Brookings Institution 
Washington, D. C. 


INTRODUCTION 

This paper presents a set of augmentations to the 
language BEAST* (Brookings Economics and Statis- 
tical Translator) as part of a continuing effort to define 
a language for a particular group of computer users, 
social scientists. In this nebulous group we include 
professional economists, political scientists, psycholo- 
gists, sociologists, and a large number of university 
students in those disciplines. An important assumption 
underlying our work has been that the cost of not 
having substantially better software than presently 
exists is very large and should be measured in terms 
of researchers’ time. The true cost of inappropriate 
methods of computer utilization should not be mea- 
sured by staff and computer costs, but by the social 
cost of the output foregone. When answers to questions 
of importance for national public policy formation re- 
quire weeks, months, or even years to obtain, the cost 
becomes a social cost that we all eventually bear. 

BEAST is a computer language designed to embody 
many of the concepts of the more quantifiable social 
sciences. The present version of the BEAST operates 
primarily upon ^rectangular” data files, that is, files 
having observations on attributes of enumeration 
units. In other words, acceptable files consist of fixed 


* Jeffrey W. Bean, Stephen W. Kidd, George Sadowsky, Beverly 
D. Sharp, THE BEAST: A User-Oriented Procedural Language 
for Social Science Research. (The Brookings Institution, June 13, 
1968). Reference to “the current BE AST” should be understood 
to refer to that paper. 


length logical records, one record for each enumeration 
unit. Many social science data files either have this 
structure or can be cast in this structure with little 
effort, and the majority of “general purpose programs / 9 
written for social scientists also assume this data 
structure. However, many social science data files have 
a more complex structure and cannot be processed 
either by the present version of the BEAST language 
or by most existing computer programs. 

This paper describes possible extensions to the 
BEAST language to make it useful for processing data 
with a more complicated structure. Though the data 
structures and language constructs described here could 
be applied to extensions of other languages, we feel 
that they have particular utility when combined with 
features already available in the BEAST. The intent 
of the proposed extensions is not to introduce a general 
list processing capability into the language as has been 
done with some other languages , 8 ,9 ,16 * 18 but to accom- 
modate a particular class of files characterized by 
hierarachical record structures. We have deliberately 
decided in favor of a limited structure that permits 
the ease of reference that is essential for the users we 
envision for the language. The generality of those 
complex structures which have been disallowed in 
the current proposal is a luxury which can only be 
bought for a significant price— *the increased specificity 
required in a language to reference such structures. 
The user who wants such generality pays the price in 
other languages in the increased tedium of writing 
his program. 


453 
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Consider a slight variant of the 1966 Survey of 
Economic Opportunity (SEO) File constructed by 
the U. S. Office of Economic-Opportunity. The organi- 
zation of data within each enumeration unit is tree- 
structured, that is, each level or segment of data may 
be followed by a variable number of segments of data 
at the next lower level. Figure 1 illustrates the structure 
of this file. 

Disaggregation by respondent characteristics, time 
period, income group, geographic area or other con- 
ditions is often very fruitful for social science research. 
For example, using this file it should be possible to 
define a subset of households or families based upon 
person characteristics, or the reverse. Such groups 
might be (1) the set of all families such that no persons 
are 65 or more years old, (2) the set of all households 
such that at least two persons earn $5,000 or more 
per year in wages and salaries, (3) the set of all families 
such that exactly two persons are less than 21 years old, 
(4) the set of all persons whose households are headed 
by a woman, (5) the set of all families that live in the 
northeast, and (6) the set of all persons whose families 
are at least five persons in size arid which live in the 
southwest. 

The current BEAST language provides the DEFINE 
SAMPLE statement for defining al subset of the user’s 
original population and the ON SAMPLE suffix for 
restricting computations to observations within that 
subset. The format of the DEFINE SAMPLE state- 
ment is: 

DEFINE SAMPLE sample name AS logical 
expression 

An example of a DEFINE SAMPLE statement would 
be 




Figure 1 — A logical structure for the survey of 
economic opportunity file 


DEFINE SAMPLE OLDMEN AS AGE > 65 
AND SEX EQ <M’ 

That sample definition could be invoked using the ON 
SAMPLE suffix to compute the average income of the 
old men in a set of data : 

LET AVINC = MEAN (INCOME) ON 
SAMPLE OLDMEN 

The ON SAMPLE suffix can be used in a similar way 
to define a restricted domain for calculation of derived 
variables, statistical procedures, and input and output. 

While the current definition of the language is 
sufficient to express extremely general conditions on 
rectangular data files, the syntax for logical expres- 
sions is insufficient for defining samples of the type 
mentioned above for the SEO file. The next two sec- 
tions describe an augmented I/O facility and an ex- 
panded conditional expression syntax designed to 
evaluate logical functions on data structures of the 
type indicated. 

Before proceeding further, it is useful to formalize 
somewhat the data structure indicated in Figure 1. 
Data related to a single entity like a person, a family 
a state, or a company we shall call a segment .* An oc- 
currence of a segment resembles one row of a rectan- 
gular data matrix: it is one set of values for a list of 
attributes, and it is defined by the list of attributes in- 
cluded in one occurrence of the segment: For example, 
a segment describing a person (a PERSON segment) 
might be defined by the list of attributes AGE, SEX, 
INCOME, and RENT. We denote that a PERSON 
segment is composed of values for those four attributes 
by writing 

PERSON [AGE, SEX, INCOME, RENT] 
or in general with the notation 
segmentname [attributelist] 


* The concept of a segment as described here should not be con- 
fused with its usage in discussions of virtual memories and address 
spaces. Our usage is close to what R. M. Balzer has called a 
“collection” in “Dataless Programming”, (Rand Corporation 
July, 1967) Memorandum RM-5290-ARPA. It also resembles 
the usage in COLINGO of “group”: COLINGO C -10 User's 
Manual , (Mitre Corporation, May 1968) Document ESD-TR- 
66-653; and the POP-2 usage of the term “record.” R. M . Burns, 
J. S. Collins, “An Introduction to the POP.2 Programming 
Language,” (University of Edinburgh, October, 1967,) Min- 
MAC Reports, No. 4. The term segment has been adopted for 
IBM's GIS file management system. 
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Figure 2 — A PERSON segment 

Figure 2 shown an example of one such PERSON 
segment. A rectangular matrix would be composed 
of a set of such “segments” conceptually placed one 
below the other. 

As an extension of that structure, segments can be 
combined by linking them to construct a “tree”. The 
tree has as its “root” a single segment, and has as its 
“branches” one or more different segments. Figure 
3 shows one such tree structure representing one 
FAMILY and three PERSON'S.* ** 

A tree such as in Figure 3 is the basic unit in our 
augmented data structure. 

We call shall each successive tier of the data hier- 
archy a level . Levels are numbered and begin at one, 
the level for those segment types not contained in any 
other segment. Level one is the highest segment level 
possible. Every segment type has a unique level as- 
sociated with it, though more than one segment type 
may occur at any level. When a segment S is connected 
to segment T by a single path through one or more 
segments, we shall say that S contains T (conversely, 
T is contained in S). All segments contained in segment 
S are called subsegments of S. Segments are contained 
in a unique segment of the next higher level. This 
restriction on the data structure permits simplification of 
the language we use to reference the structures. In 
particular, it permits attributes of segments at one 
level to be “imputed” to segments at a lower level, and 
it obviates explicit upward-references when referring 
to low level segments. 

We shall call information about containment (which 
segments contain or are contained in which other seg- 
ments) structural information about the data, as dis- 
tinct from the data itself. The structural information 
of a file is often contained only implicitly in the 
physical arrangement of the data in the file. When data 
are read into memory, the structural information 


* These figures give no indication of the physical structure of 
the data. There are several reasonable ways in which such data 
could be arranged, but the language used to talk about such data 
should be independent of the physical arrangement of the data. 

** For convenience we will call “an occurrence of the structure 
defined by “X” simply “an X”. 



muon s 


Figure 3— A simple enumeration unit 

should become explicitly represented as a list structure 
for efficiency in processing. 

Trees of the forms described above can often repre- 
sent naturally the structure of the enumeration units, 
(EU's) encountered in social science research. For 
the purposes of this paper the tree that represents 
an enumeration unit consists of a unique segment type 
at level one called the root segment together with all its 
subsegments. A file is an ordered set of such enumeration 
units. To denote that an enumeration unit has a 
structure we shall give the entire aggregate a name and 
define its constituents according to their relations. The 
simple tree structure in Figure 3 would be defined in 
BEAST by writing 

DEFINE EU FAMSTRUCT AS 

1 FAMILY [REGION, WEALTH, 
URBANRURAL] 

2 PERSON [AGE, SEX, INCOME, 

RENT] 

The purpose of such a definition is to describe the set 
of possible occurrences of the enumeration unit, since 
an EU definition says nothing about whether a particu- 
lar occurrence of the structure will actually have any 
subsegments, the number of subsegments, or the phys- 
ical order or type of the attributes in the segments. 

Another example of an EU definition is: 

DEFINE EU CONGCOMMITTEE AS 

1 COMMITTEE [NAME, BUDGET] 

2 MEMBER [LAST, FIRST, STATE, 

PARTY] 

This definition specifies a tree structure with two levels 
that represents a Congressional committee. The root 
segment is a COMMITTEE segment, and for the pur- 
poses of CONGCOMMITTEE it has only two attri- 
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butes, NAME and BUDGET. Segments of type COM- 
MITTEE are assumed to contain Only segments of type 
MEMBER. On input only structural information 
relating COMMITTEE and MEMBER segments will 
be retrieved from the file though the file may contain 
other segment types and attributes. On output only 
the structural information indicated will be displayed. 
As a third example consider the structure defined by the 
statement 

DEFINE EU DWELLING AS 

1 HOUSEHOLD [AGEOFHEAD, 

SEXOFHEAD]! 

2 FAMILY 

3 PERSON [AGE, INCOME] 

When a segment name (FAMILY) is included in an 
EU definition with no attributes listed, then only the 
structural information at that level is extracted from 
the file. In this example, DWELLING would have the 
form indicated in Figure 4. With such an EU structure 
one could evaluate logical expressions that required 
structural information, but no attributes, at the family 
level. We might, for example, reference 

PERSON'S IN FAMILY'S WITH AT LEAST 4 
PERSON'S 

No FAMILY attributes are needed because only 



Figure 4 — Example 3 


structural information is required to evaluate this 
expression. 

New attributes for FAMILY segments could also 
be generated from such a structure that begins with a 
null attribute list. We could compute the total income 
in each FAMILY segment (the sum of all PERSON'S 
income contained in that segment) using the TOTAL 
function: 

LET FAMINC = TOTAL INCOME WITHIN 
FAMILY 


The function TOTAL has the general form 



attribute 1 



TOTAL 

1 


WITHIN segmntid 


| 


[subscript] [boolprim] 


segmntid J 


- 


In the example above we have taken the total of an 
attribute (INCOME), where the summation is taken 
over all values for INCOME contained within the 
specified segment, FAMILY. BEAST assumed that 
iteration is intended over all FAMILY segments since 
no subscript or modifier is put on the segment identi- 
fiers. 

To explicitly assign a new attribute to a segment we 
will use the notation 

LET segmntid: attributename = expression 

Using this notation and the TOTAL function to count 
PERSON subsegments, we can create an attribute 
in each FAMILY segmente qual to theav erage income 
of all persons in the FAMILY : 

LET FAMILY: AVINC - (TOTAL INCOME 

WITHIN FAMILY)/ 
(TOTAL PERSON'S 
WITHIN FAMILY) 

Again iteration over FAMILY segments is implied 
because the segment identifier is unqualified. The 
value of the function would become a scalar if the 
second segment identifier were qualified with either a 
simple logical condition or a BEAST subscript. 

For example, the statement 

LET X « TOTAL INCOME WITHIN 
FAMILY'S (1. . . 100) 


* The syntactic type boolprim represents a single logical term. 








Incorporating Complex Data Structures 457 


would compute the sum of the income of all persons 
contained in the first 100 families and assign the value 
to the scalar variable X. 

Conditions on structures 

Logical expressions to deal with tree structures of the 
type described in the above section must be capable 
of expressing both intra-segment relations (analogous 
to present BEAST logical expressions) and interlevel 
relations among segments contained in or containing 
the reference segment of an expression. The reference 
segment of an expression is that segment with which 
the value of the expression is associated, distinguishing 
it from the other segments upon which the value of the 
expression may also depend. For example, the reference 
segment of the logical expression 


TABLE I — Syntax for conditions on structures 


•ample«Mtem«nt 

refaegmntid 

•egmntid 

logexp 

boolprim 


DEFINE SAMPLE name AS [refaegmntid^ logexp 
segmntid 

aegmntname | aegmentneme'S | aegmntname (aubacript) 
boolprim |qR D J logexp J boolprim 


WITH 

IN WHICH 
FOR WHICH 
FOR WHOM 
HAS 
HAVE 
INCLUDE 
k INCLUDES 
f aameplename 


lnboolprim | IN | ( boolp „ m) } | 

[ALL | | a egmntid boolprim | 


boolprim | NOT boolprim J (logexp) J 
-rinjj | 


[EVERY 

lnboolprim 

quantifier 

quantop 

where 


lnboolprim J 

:= (logexp) | logical var j numexp relop numexp 
!“ NO | ANY | A | AN [ ONLY | [quentopj integerexp 

EXACTLY | AT LEAST | AT MOST | MORE THAN | LESS THAN 


FAMILY’S IN HOUSEHOLD’S WITH LESS 
THAN 10 PERSON’S 

is the FAMILY segment because the expression defines 
a condition on FAMILY segments. Three segment 
types appear in the expression — FAMILY, HOUSE- 
HOLD, and PERSON — but the value of the entire 
expression is clearly a condition on each FAMILY. 
Had the expression been simply 

HOUSEHOLD’S WITH LESS THAN 10 
PERSON’S 

then the reference segment would have been HOUSE- 
HOLD. 


logicalvar := variable of type logical 

numexp :« arithmetic expression 

relop ;« EQ j NE | GT | LT | GE j LE 

integerexp :* expression with an integral value 

1. It is often useful to test whether a segment is 
contained in another segment having certain 
characteristics, e.g., whether a PRODUCT seg- 
ment is Contained in a COMPANY segment of 
a particular sort or whether a segment with 
quarterly data is contained in a, segment with 
particular annual data. To make such a test we 
have added a logical operator with the form 


Table I gives a formal syntax for sample definitions 
using the proposed extensions to logical expressions. 
The set of words WITH, IN WHICH, etc., are used as 
“noise” words and are not significant for the interpre- 
tation of an expression. The construct ’S is used op- 
tionally to imply a plural and not a possessive. Note 
that, for example, the plural of FAMILY becomes 
FAMILY'S, not FAMILIES. 


The primary additions to the current BEAST's logi- 
cal expression syntax are the three logical primitives 
defined by the syntax specifications 

(1) IN fsamplename J 

{segmntid [boolprim] J 

(2) quantifier jinboolprim 

[segmntid [boolprim] 

(3) (ALL 1 /segmntid boolprim 1 

(EVERY/ jinboolprim j 


IN 


[samplename 
segmntid [boolprim] 


If a segment identifier immediately precedes the 
word IN then the test is applied to that segment. 
If no explicit identifier is used then the test is 
applied to the reference segment of the expression. 
For convenience let us call the segment being 
tested ‘S'. Considering the form 


S IN segmntid [boolprim] 


the system first checks whether the segment 
containing S is of type segmntid. If no condition 
is specified on segmntid then the value of the 
IN phrase is the truth value of that inclusion 
test. If the segment S is contained in segmntid, 
then any condition on segmntid is also evaluated 
and the value of the IN phrase becomes the 
truth or falsity of the condition on the segment 
at the higher level. 
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If the IN operator has the form 
S IN samplename 


among its subsegments. The permissible forms 
for an quantifier are (1) A, AN, ANY, NO, 
ONLY (2) any of the relations 


the test is TRUE if S is a member of the sample 
defined by samplename and FALSE otherwise. 
In this case the reference segment of the sample 
definition must be (1) the same as S or (2) a 
segment that contains S. 

For example, using the SEO file, one might say 

DEFINE SAMPLE! SI AS FAMILY'S 
IN HOUSEHOLD WITH AGEOF- 
HEAD OVER 6$ 

The reference segment is explicitly specified 
(after AS) as being the FAMILY segment. The 
segmntid is HOUSEHOLD, and the Boolean 
primitive modifying HOUSEHOLD is WITH 
AGEOFHEAD OVER 65. A particular FAM- 
ILY segment will be a member of the sample SI 
if it is contained in a HOUSEHOLD segment 
with an elderly head. 

For a second example, let us assume that 
there are two types of family segments, called 
COUNTRYFAM and (jlTYFAM, each of 
which may contain PERSON segments. We 
define as a sample called CITYFOLK all PER- 
SON segments contained in CITYFAM seg- 
ments by the statement: 

DEFINE SAMPLE CITYFOLK AS 
PERSON'S IN CITYFAM'S 


EXACTLY 
AT LEAST 
AT MOST 
MORE THAN 
LESS THAN 

followed by an integral scalar expression or (3) 
simply an integer expression. A, AN, and ANY 
are equivalent to AT LEAST 1, and NO 1 is 
equivalent to EXACTLY 0. The quantifier 
ONLY indicates no specific number of occur- 
rences, but is TRUE if and only if S contains at 
that level only segments of type segmntid, and 
they satisfy the condition imposed on them, if 
any. 

The condition referenced by the quantifier 
may be subsegments that satisfy some condition 
or simply the existence of the subsegments.. A 
segment satisfying a condition is specified by 
either a segment identifier with a logical primi- 
tive, or simply an intrasegment boolean primi- 
tive {inboolprim) which is a condition made from 
attributes all in the same segment type. 
Since the value of an inboolprim is uniquely 
associated with a particular segment, an in- 
boolprim is equivalent to a segment with a 
condition on it (See example below). 

The following sample definitions illustrate 
the use of the quantifier logical operator applied 
to a file of household survey data. 


2. While the first logical operator gave us the 
ability to express conditions on the segments 
that contain the reference segment of an ex- 
pression, the second operator puts conditions 
on segments that the reference segment may 
contain. This operator has the general form 


quantifier 


|inboolprim ['S] 
[segmntid [boolprim] 


As with IN, the segment fji to which this phrase 
refers will be the reference segment unless it 


immediately follows a different segment identi- 


fier. 


DEFINE SAMPLE BIGFAMS AS 
FAMILY'S WITH AT LEAST 4 
PERSON'S 

This sample definition has FAMILY'S as its 
reference segment. The FAMILY segments in 
the sample are defined by a single logical 
primitive. According to the syntax specification, 
AT LEAST 4 is a quantifier , composed of a 
quantop (AT LEAST) followed by an integerexp 
which in this example is simply the number 4. 
In this example the quantifier is followed by a 
sirpple unqualified segment identifier, PER- 
SON'S. 


The quantifier operator tests whether S 
contains a specified number (given by the 
quantifier) of occurrences of some condition 


DEFINE SAMPLE CROWDED AS 
FAMILY'S IN HOUSEHOLD'S WITH 
AT LEAST 10 PERSON'S 
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The sample CROWDED is defined using both 
the primitives IN and a quantifier. IN is a 
condition on FAMILY’S because it follows 
immediately after the declaration of the refer- 
ence segment. IN is followed here by the segment 
identifier HOUSEHOLD’S qualified by the 
phrase AT LEAST 10 PERSON’S. Evaluation 
of this expression involves a relatively complex 
computation on each enumeration unit, since for 
each FAMILY (a level 2 segment) it is necessary 
to find the total number of PERSON’S (at 
level 3) contained in the parent HOUSEHOLD 
segment at level 1. 

DEFINE SAMPLE ELDERLYFAMS AS 
FAMILY’S WITH AT LEAST 2 
AGE’S > 60 

This example shows one use of the construction 
called an intrasegment boolean primitive. Assum- 
ing that AGE is an attribute of the segment 
type PERSON, the quantifier phrase above 
would be equivalent to AT LEAST 2 PER- 
SON’S WITH AGE > 60. 

3. The final condition on a segment is also an 
operator applied to its subsegments. Though 
similar to ONLY, ALL and EVERY are eval- 
uated using only the segment type indicated, 
and are independent of any other subsegment 
types which S may contain at the same level as 
segmntid. Also, a condition must be specified on 
the segment identifier. The words ALL and 
EVERY are equivalent. The general form is 


Input and output 

An integral part of the BEAST language is its 
reliance upon machine readable codebooks for describing 
data files. The machine readable codebook includes a 
format description, including the physical and logical 
formats of the data file, the name and positions of all 
data items in the file, and the meaning of their permis- 
sible values. The BEAST system automatically refer- 
ences this information to interpret any user commands 
relating to a file. 

As an example of a simple input request, suppose a 
user is investigating the relation between housing 
costs and income for different age-sex combinations. 
He knows that a given file, SURVEYFILE, contains 
the results of a sample survey useful to his investiga- 
ti n; he also knows that the file contains at least the 
following four attributes of each respondent: age, sex, 
rent, and monthly income. In order to access this body 
of information using the BEAST, he writes : 

SELECT SURVEYFILE 

to designate SURVEYFILE as the current input file. 
Tire execution of the SELECT statement causes the 
BEAST to read the codebook associated with SUR- 
VEYFILE in preparation for an actual input request. 

The codebook contains attribute names for each 
respondent item; suppose that those corresponding to 
the above attributes are AGE, SEX, RENT, and IN- 
COME. For each attribute, the set of measurements 
for all respondents is represented as a column vector. 
To extract these attributes, the user writes in the 
BEAST: 


[ALL fsegmntid boolprim 

[EVERY inboolprim 


For example, 
form 


considering a structure of the 


1 FAMILY 
2 CHILD 

2 ADULT [ASSETS] 
we could say 


FAMILY’S IN WHICH EVERY ADULT 
HAS ASSETS > 500 


and the value of the expression would be inde- 
pendent of the contents of the CHILD segments 
contained in any FAMILY segment. 


GET SEX, AGE, INCOME, RENT 

Execution of this GET statement causes four vectors 
to be extracted from the file and placed in working 
storage. There is no ordering rule for the input list; 
the order of the names has no relation to their physical 
arrangement on the file. 

The remainder of this section shows how the “access 
by name” referencing of files can be extended to in- 
corporate the more complex structures described in 
this paper. When only one segment type is considered 
there is no change from the current BEAST specifica- 
tion because there is no structural information. To 
signal the system that structural information exists 
in a file the user replaces the simple attribute list in a 
GET statement with either the name of an EU struc- 
ture or an actual EU specification. Such a GET state- 
ment indicates that the structural information as well 
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as the data values should be retrieved from the current 
input file. Similarly, an EU specification used in an 
output list will result in the display of only the attributes 
and structure indicated in the specification. As with the 
current BEAST, if no subscripts or sample qualifications 
are specified in an I/O list then every occurrence of the 
elements specified in the list will be retrieved or printed. 

Using this form of 1/ 0 list one could write 

DEFINE EU DWELLING AS 

1 HOUSEHOLD [CITY, STATE] 

2 FAMILY [FAMTYPE] 

SELECT SURVEYFILE 
GET DWELLING (1. . .100) 

The first statement defines DWELLING as a tree 
structure with two levels, the household level and the 
family level. There are two attributes at the house- 
hold level; they give the city and state where the 
household is located. Ihere is only one attribute at the 
family level, an indication of family type. The GET 
statement results in the extraction of the first 100 of 
these enumeration units from the data file called 
SURVEYFILE. The resulting number of FAMILY 
segments in these 100 HOUSEHOLDS is unknown, 
but it can be found by using the TOTAL function. 


LET NFAMS - TOTAL FAMILY'S IN 
HOUSEHOLD'S (1. . .100) 


Since the segment identifier FAMILY'S is used with 
an explicit qualifier as the object of TOTAL, the value 
of the function will be a scalar equal to the number of 
FAMILY segments contained in the first 100 HOUSE- 
11 OLD segments. 


A small BEAST program 

We conclude with a small but complete program 
utilizing the data structures and statement types 
described in this paper. This example also illustrates 
two other BEAST statement types, the REPEAT and 
COMPUTE statements. The iteration statement in 
BEAST is distinguished by the fact that its dummy 
argument is defined “by name" rather than “by value." 
This is a useful device permitting the dummy to be used 
on the left side of an assignment statement, to be only 
partially defined on entry of a repeat block, and to 
assume as a value any entity ini the language that may 
be named. The general form of the iteration statement 
is given by 


[label:] REPEAT FOR dummy 1 = namelist 1 
[AND FOR dummy 2 = namelist 2]. . . 


END [label] 

The dummy variable must be used in such a way within 
the range of a REPEAT that substitution of all ele- 
ments of the list result in syntactically correct BEAST 
statements. 

The COMPUTE statement is used to execute com- 
plex statistical procedures and print their results. The 
COMPUTE statement has the general form 

COMPUTE procedure OF dataphrase [WITH 
optionsphrase] [ON SAMPLE name] 

The procedure may specify any of a number of pro- 
cedures including cross-tabulation, correlation, multi- 
ple regression, and analysis of variance. The data to 
which the procedure is to be applied is specified in the 
dataphrase , and the exact form of the dataphrase de- 
pends on the procedure being invoked. The parameters 
of the procedure can be modified using the optionsphrase . 
One may, for example, specify that the residuals of a 
regression equation are to be printed as part of the 
output. 

When the arguments of a procedure are at more than 
one level the number of “observations" derived from 
an enumeration unit equals the number of occurrences 
of the lowest level reference. In such a case the value 
of the higher level references are distributed over their 
subsegments giving a rectangular expansion of the tree 
structure. When the phrase ON SAMPLE name is ap- 
pended to a COMPUTE statement the procedure is 
executed using only the observations that are included 
in the sample name . The reference segment of the sam- 
ple must be at least as high as the lowest level attribute 
in the dataphrase . 

Table II shows a program that uses the six subsets 
defined in the introduction as selection criteria for two 
cross tabulations using the Survey of Economic Oppor- 
tunity file. The program will calculate and print a total 
of 12 cross tabulations, two on each of the six samples 
defined. Because the variables in the COMPUTE are 
at the PERSON level we may use either PERSON, 
FAMILY, or HOUSEHOLD level samples. 

CONCLUSION 

Languages designed for statistics have tended to operate 
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TABLE II — Sample program 

DEFINE THE STRUCTURE OF THE ENUMERATION UNIT# 

DEFINE EU DWELLING AS 1 HOUSEHOLD [SEXOFHEAD, AGEOFHEADj 

2 FAMILY [FAMTYP, REGION, URBANRURAL] 

3 PERSON [WAGES, SALARY, AGE, SKILEVEL] 

DEFINE SAMPLE SI AS FAMILY’S WITH NO AGE’S > 65 

DEFINE SAMPLE S2 AS HOUSEHOLD’S IN WHICH AT LEAST 2 PERSON’S HAVE 
(WAGES + SALARY) > 5000 

DEFINE SAMPLE S3 AS FAMILY’S WITH 2 AGE’S < 21 

DEFINE SAMPLE S4 AS PERSON’S IN HOUSEHOLD’S WITH SEXOFHEAD EQ *F’ 
DEFINE SAMPLE S5 AS FAMILY’S WITH REGION = I 

DEFINE SAMPLE S6 AS PERSON’S IN FAMILY’S (WITH AT LEAST 5 PERSON’S 
AND WITH REGION EQ 7) 

SELECT SEO66 ISE066 IS NOW THE INPUT FILE.# 

GET DWELLING (1. . . 5 000) 

L:REPEAT FOR X = SI. S2, S3, S4. S5, S6 # ITERATE OVER SAMPLE DEFINITIONS# 
#THE PERMISSIBLE CATEGORIES FOR EACH VARIABLE ARE GIVEN IN THE 
CODEBOOK OF SEO66 .# 

COMPUTE CROSSTAB OF SEX, AGE, SKILEVEL ON SAMPLE X 

COMPUTE CROSSTAB OF URBANRURAL, SKILEVEL ON SAMPLE X 

END L IREPEATS MAY BE NESTED TO ANY DEPTH.# 

STOP 

upon the simplest data structures, while languages 
with facilities for the more complex structures have 
seldom had the statistical operations that have made 
the current version of BEAST attractive. By extending 
BEAST to include the tree structures described here 
we hope to increase the usefulness of the language 
without sacrificing any of the convenience of the current 
language. While the methods of referencing such struc- 
tures have been stressed here it is nonetheless important 
to be able to manipulate such structures to add and 
delete individual segments and entire levels. We have 
not presented our tentative solutions to the problems 
of manipulating segments. 
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Nanosecond threshold logic gates for 
16 x 16 bit, 80 ns LSI multiplier 
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INTRODUCTION 

Previous research and development efforts in digital 
monolithic integrated circuits and arrays were almost 
exclusively concerned with Boolean logic. However, 
by introducing threshold logic, considerable savings in 
gate count as well as in subsystem processing speed 
are evident. When logic subsystems, such as registers, 
adders, counters or combinational control logic, de- 
signed with common NOR logic, were replaced by 
subsystems employing threshold logic, average savings 
in gate count of three to one have been demonstrated. 15 * 16 
Furthermore, the number of consecutive logic levels 
necessary to implement a given switching function, 
and thus the relative processing delay, is also generally 
reduced by the same ratio . 

The full adder function requires two inverting 
threshold gates, and carry propagation is accomplished 
with only one gate delay per stage. Basic flip-flop 
types can be implemented by a single threshold gate. 
Advanced parallel adders for three addends would 
consist of three threshold gates per bit, and functional 
multipliers should also become practical in iterative 
array implementation. 

For full utilization of the much greater logical 
capability of threshold gates, the employed technologies 
should be amenable to large scale integration which 
excludes hybrid approaches. Utilization of such mono- 
lithic threshold gates and arrays is possible in most 
kinds of computers, data processing and control 
equipment. 


Threshold logic gates with nonlinear feedback 

Smith and Pohm have demonstrated the ultra-high 
speed capability of threshold logic gates in the form 
of RTL gates modified with negative, nonlinear current 
feedback. 1,2 In these gates Y G b was clamped to V re / 
= Y be by tunnel or backward diodes (Figure 1) which 
prevented both saturation and cutoff ; thus, the tran- 
sistor always operated in the ON condition near the 
f r peak. In reverse direction, of course, the diodes 
functioned as the familiar Baker clamp. Propagation 
delays between 5 and 1.2 ns were achieved in bread- 
board implementation with fan-out of 3 and 2. By 
varying the bias current I», the authors implemented 
NAND, Majority, and NOR for various values of 
fan-in = <8 and also the threshold functions lying be- 
tween these special cases. The gates were not amenable 



Figure 1— Modified RTL gate witn tunnei diode feedback 
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to monolithic integration, however, because of the 
tunnel diodes and because high-accuracy resistors 
were required for current biasing. 

Threshold logic gates for LSI 

When the first experimental Schottky barrier diodes 
became available, 3 circuits similar to the gates described 
in the first section, but having symmetrical transfer 
functions, were studied by the author. A pair of anti- 
parallel diodes were used for the collector base feedback 
(Figure 2). The plastic-encapsulated diodes, with 
molybdenum silicon interface, had 0.8 pF capacity 
at 0 V, and V D = 0.4V and AV/AI = 2 512 at 1 mA. 
Type 2N 918 transistors were selected for maximum 
f y and minimum r&. With 19.2 mW average power and 
i = 0.8 mA, the switching times (t<* + t r ) and (t, 
+ fy) were between 1.65 and 1.85 ns. 

Luce confirmed these switching time measurements, 4 
and using experimental transistors SMX2-T with 
i T — 2.8 GHz at V c b = 0 V and I c = 2 mA, he 
achieved propagation delays of 1 .8 nsw ith only 5 .6 mW 
circuit power. The 400 mV, TO-18 Schottky diodes 
contributed 2 pF Miller capacity. With reduced voltage 
swing of ±300 mV, Luce attained average t p d = 1.4 ns 
and minimum t p d = 0.8 ns. 

Figure 3 depicts the symmetrical current-in, voltage- 
out transfer function and the summing point character- 
istics of the new gate which exhibits, at I, ft a switching 
step in V BB of only 23 mV. 

A basic improvement of this symmetrical threshold 
gate over RTL circuits should be pointed out. In RTL 
it is the sum of input currents I»* > plus the (negative) 

base bias current 1^, which turns on the npn transistor. 
In the new threshold gate, the j (positive) base bias 
current keeps the transistor at the threshold point 
in ON condition. The input current sum 
=*= I«, 2 is merely required to switch the gate from 
I<a to its high or low state. 

First order worst case analysis of the basic 5-input 



Figure 2 — Threshold gate with Scnottky diode feedback 
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Figure 3 — Transfer function of the gate .shown itr Figure 2 

gate was performed under the assumption of temper- 
ature tracking of communicating circuits in monolithic 
LSI environment. The transistors should have /3§:40 
and V BE matching of ± 5 mV, while a resistor ratio 
tolerance of =fc 3 percent is required. These character- 
istics can be attained in LSI with good manufacturing 
yield. The Schottky diodes should have Vd — 0.4 V 
± 15 percent at 0.8 mA and very low capacitance for 
the high-speed version of the gate (i ^ 0.8 mA). 
Several other versions are discussed in the next section. 

Compatible metal-silicon Schottky diodes have been 
proposed (Mo 6,17 ) and implemented (Mo, 6 Al 7 * 8 * 19 Pt 18 ) 
as Baker clamps in integrated circuits which were mostly 
of the TTL type. The same technology is applicable 
to the modified RTL threshold gates. Transistors with 
fr^2GHz are now available for LSI utilization at low 
I c and V cb values. 9 * 10 

Various optimizations of the integrated gate 

While the experiments described in an earlier section 
were concentrated on high speed gates, with unity 
current step i = 0.8 mA, other circuit options would 
emphasize optimization in the following areas. 

Low power. 

For low fan-in gates, power consumption can be 
reduced by small input and collector currents and by 
lower collector voltage. The former implies transistor 
beta ^ 80 in order to minimize the influence of absolute 
variations in G bby G* and 0) unity currents i^0.2 mA 
are attainable with this beta. The latter requires an 
active source or the collector current. This would 
improve the circuit dc performance since collector 
current variations between the high and the low out- 
put states would be minimized. Trade-off studies are 
required in order to determine whether current source 
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or collector resistor contributes lower collector load 
capacities. The high area consumption and the low 
beta of lateral pnp transistors makes the current source 
leas attractive for LSI circuits at this time. 

High input weight count. 

When many or all inputs are low, the high negative 
summation current must be accommodated by feed- 
back diode current and collector resistor current. Mini- 
mum input current (i/2 g 0.1 mA) and high transistor 
beta are again required. Tantalum thin-fllm overlay 
resistors would provide high sheet resistivity for ac- 
commodating the large number of input resistors 
without requiring overly large substrate area con- 
sumption. Decoupling diodes 1 * 2 would ease the problems 
of leakage currents and of resistor tolerance require- 
ments. 

Improved noise immunity. 

Two Schottky diodes in series per feedback branch 
(or simply two anti-parallel silicon diodes) would 
increase the voltage swing to ±0.8 V (or ±0.6 V). 
Collector biasing would be required in order to avoid 
saturation. 1 * 2 

High fan-out. 

Collector biasing in combination with an emitter- 
follower output stage (Figure 4) would greatly improve 
the output drive capability- 1 . 

EC inputs. 

As Smith and Pohm pointed out, capacitive shunting 
of the input resistors would increase the gate switching 
speed for very low fan-out. Capacitive shunting, how- 
ever, is an acceptable method only for low-noise array 
environment and for non-reversing switching tran- 
sitions (no spikes) , such as in a ripple carry. 

Reduced Miller effect. 

The detrimental Miller effect could be reduced if 
only one (symmetrical) Schottky diode were used with 
V D = ±0.4 V at Ip = ±i/2. Following a suggestion 
by Schuermeyer, 11 this type of diode can be obtained 
through very high concentration of surface states. 

Proposed functional LSI multiplier 

Recent advancements in the state-of-the-art of 
silicon processing for medium scale and large scale 



Figure 4 — Co.lector biasing and emitter follower output 


integration have made possible the implementation of 
monolithic arrays composed of the new threshold gate. 

High densities with 1 1 mil 2 average area consumption 
per component have been achieved in pilot line LSI 
with good processing yield; 10 this includes intra/inter- 
connections and three layers of metallization which 
facilitate optimum layout. The array was an 8-bit 
adder employing ECL trees with 1.2 ns carry propa- 
gation. The transistors have 0.15 X 0.8 mil emitters 
and 100 ^/square base resistivity. The resistors were 
0.5 mil wide with values in the 100 . . .400 U range and 
exhibited 6 percent ratio tolerance on 60 Q /square 
material. 

The threshold gates of the second section require 
clusters of equal resistors in the 1 ... 4 k!2 range 
with 3 percent ratio tolerance. This tolerance could 
be attained with 0.5 mil wide resistors on 100 
12/square base material. The 2 GHz transistors with 
0.1 X 0.4 mil emitters discussed by Phillips et al 10 
should also be amenable to LSI in the early Seventies. 

The 10 mW high speed gate with i = 0.8 mA uses 
1 k!2 resistors; with fan-in of 5, this gate would en- 
compass a substrate area of approximately 6 X 13 
mil 2 . 

A 16 X 16 bit functional multiplier is proposed for 
LSI implementation using the 10 mW, 1 ns threshold 
gate. Figure 5 shows the matrix of multiplier cells in 
skewed form with all sum and output bits having a 
given binary weight arranged in the same column. 
Each cell M of the multiplier (Figure 6) is composed 
of a full adder and an AND gate which performs the 
multiplication. The cell in Figure 6 operates on X* and 
Yj and the adder inputs are P« = X*Y „ Cu-d <», 
and S ( , >i)(;-i). If the gates of the third section with 
symmetrical transfer function are used and if T' wo 
(Xi,X 2 , . . ., X k ) = T'M is the inverting threshold 
function, all three multiplier-cell gates can be imple- 
mented with the threshold w G = 0 

Ptv = X»Y, = T+o.ft(X f -, Yy) 
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INPUT REGISTER X 



PRODUCT REGISTER 0 

Figure 5 — 16 X 16 bit functional multiplier 



Figure 6— Functional multiplier cell 


= T' 0 (X' t *, Y'y, +0.5) 

C f ij = T'o (P ijy C(i_l )( y), S(t+l)(J— 1)) 

S '* i = T'o (P a, C(i-i )( y), S(* + i>(;-i), 2C '» 3 ) . 

The proposed multiplier would be implemented on 
four LSI dies with 64 iterative cells each (Figure 7) 
with two layers of metallization. For attaining optimum 
layout and, thereby, higher component density, an 
implementation with three layers may be preferable. 10 
The three gates per cell would occupy an area of 13 
X 18 mil 2 , and each LSI die would have an area of 
^ 115 X 155 mil 2 . For 10 mW power per gate, the 
array will consume 1.92 W power, and 44-pin 1X1 
in 2 stud packages would provide adequate thermal 
management. 

The multiplier cells are used in two complementary 
ways — as Type 1 cells with positive inputs/negative 



Figure 7 — 16 X 16 bit functional multiplier on four 
LSI dies 


outputs (Figure 8a) and as Type 2 cells with negative 
inputs/positive outputs (Figure 8b). Equivalent to the 
odd/even levels in NOR logic design, 12 alternating 
matrix columns (Figure 9a, b) are composed of Type 1 
cells using inverted inputs Xy (i = odd, e.g.) and of 
Type 2 cells using true inputs X,-+i (i + 1 == even). 
Only one bus connection to the matrix is repaired per 
flip-flop in the X-register, whereas both Y ) and Y'y 
are bussed through each horizontal row. An additional 
column i = 17 of carry inverters (Figure 10) converts 
C (i6) y into S(i 7 )y. 

Although the average carry ripple length is much 
shorter 13 than the full length of each partial product 
adder (having j = const.), no carry look-ahead cir- 
cuitry is included since it would corrupt the iterative 
structure of the multiplier and also the approach of 
minimum wafer area consumption of the LSI array. 

The worst case multiplication time t m f° r ^ wo k- 
bit numbers includes 2k — 1 carry delays and k — 1 
sum delays (Figure 10). Three nanoseconds should be 
allowed for each package-to -package transition t* r 
assuming matched transmission lines, and a setting 
time of less than 2 ns is required for the output flip- 
flops Q each of which consists of a single threshold 
gate. 14 Fork = 16, 

t M = t pd (AND) + 31 t pd (Carry) + 15 t pd (Sum) 
+ 5 t t r + t 8et 
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Figure 9a — Alternating type 1/type 2 cells in the 
multiplier matrix 



Figure 8a — Type 1 cell 
Figure 8b— Type 2 cell 


= (1 + 31 + 30 + 15 + 2) ns 
= 79 ns. 

CONCLUSIONS 

New low-power nanosecond threshold logic gates which 
are amenable to monolithic LSI have been discussed. 
These gates require high-performance integrated de- 
vices, and the necessary advanced silicon processing 
techniques should be available with high manufactur- 
ing yield in the early Seventies. Functional LSI multi- 
pliers with 80 ns multiplication time for two 16-bit 
numbers have been proposed. Such multipliers and 



Figure 9b — Alternating columns X/X composed of 
type 1 /type 2 cells 
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Figure 10 — Longest path for worst-case multiplication 
time 


similar monolithic LSI arrays, e.g., high-speed adders, 
counters, and control logic subsystems, can be ad- 
vantageously implemented with threshold logic; 16 the 
average savings in gate count is :3 :1, and the number of 
interconnections is reduced by 2:1 or more. LSI arrays 
with the new 10 mW, 1 ns threshold gate would be 
applicable to future ultra-fast^ low-power data pro- 
cessing systems. Practical procedures for logic design 
with threshold logic gates were published elsewhere 
by Winder. 16 
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Silicon-on-sapphire complementary MOS 
circuits for high speed associative 
memory * 

by J. R. BURNS and J. H. SCOTT 

RCA Laboratories 
Princeton, New Jersey 


INTRODUCTION 

The utility of associative memory in a wide variety of in- 
formation handling systems has been long recognized 
and in the early 1950's such memory systems were pro- 
posed for implementation through cryotron logic and 
storage arrays. Cryogenic element technology afforded 
the ingredient of compatible logic and memory within 
a basic cell, a requirement essential to the practical 
realization of associative memories. To date, such an 
approach has not been successful due mainly to proc- 
essing difficulties connected with thin film elements 
operating in a liquid helium environment. Other ap- 
proaches, involving the use of multi-apertured mag- 
netic elements, have been proposed and implemented, 
but the resultant cost was prohibitive due to complexi- 
ties of peripheral electronics as well as the magnetic 
storage element itself. Furthermore, systems of this 
type have relatively long parallel search times 10 
Msecs) especially if access is on a serial-by-bit basis. 
These considerations have seriously limited the appli- 
cability of associative concepts in all forms of data proc- 
essing and have resulted in a situation where system 
designers do not consider associative memory as a 
solution to a given problem in spite of many obvious 
advantages in applications such as sorting, merging, 


* The research reported in this paper was sponsored by the Elec- 
tronic Research Directorate, Rome Air Development Center, Air 
Force Systems Command, Griffiss Air Force Base, New York 
under contract F30602-68-C-01 97. 


pattern recognition, and most recently, memory allo- 
cation in time shared computers. 

Many of the objections mentioned above are not 
valid today because of the rapid evolution of integrated 
circuit technology. This is particularly the case for 
semiconductor memory arrays where the universality 
and regularity of such sub systems take full advantage 
of the low cost potential of Large Scale Integration 
(LSI). Considerable effort has been expended through- 
out the industry on high speed random access memory 
arrays having non-destructive read-out in the sub- 
100 nanosecond range where the cost of competitive 
magnetic memories is dictated by the high quality 
peripheral electronic circuitry. Although a substantial 
part of this effort has been on bipolar memories, the 
dominant trend is toward MOS memories because of 
the simpler processing technology, lower power dissi- 
pation, and smaller silicon area per bit, all of which 
lead to lower cost systems. Monolithic silicon MOS 
memories generally take two forms, i.e., single polarity 
MOS arrays, usually P type, and complementary MOS, 
a unique circuit configuration capable of higher speed 
and extremely low power but at the expense of more 
complex processing technology and slightly higher 
costs. This is the approach taken here for the realiza- 
tion of sophisticated associative memory with one 
important difference, namely, the utilization of thin 
film silicon-on-sapphire technology 1 for the fabrication 
of high quality complementary MOS arrays. Silicon-on - 
sapphire combines the best features of monolithic 
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silicon and thin film integrated; circuits through the 
epitaxial growth of thin films of single crystal silicon- 
on-sapphire substrates which can be selectively re- 
moved so that all .parasitic reactance which degrades 
the performance of monolithic circuits is effectively 
eliminated. Coupled with the improved circuit perfor- 
mance is a potentially simpler processing sequence for 
CMOS integrated arrays (requiring only two noncritical 
source-drain diffusions) which will substantially reduce 
costs as well. 

Associative memory design 

General considerations 

Several considerations influence the design of an 
associative memory array, the majority of these having 
to do with the sophistication required of the memory. 
Based on requirements believed tp be minimal in most 
associative applications, the following features are 
desirable : 

1. Normal operation as a read-write random access 
memory (having high speed non-destructive 
read out) in addition to; completely parallel 
associative search operation. 

2. “Masked” search capability so that any part of 
the total field can be eliminated from the search 
word. This will also provide a “masked” write 
wherein partial updating of the field of a selected 
word is possible. 

3. Modular array design so that associative mem- 


ories of arbitrary numbers of words and bits 
per word can be constructed by “wired OR” 
connections of the word and bit lines of in- 
dividual modules. 

Accordingly, the module was chosen to be one of 
four words each four bits long and has the basic block 
diagram shown in Figure 1 . 

Operation of the module is summarized in the follow- 
ing Table I. 

As shown, the module performs as a read-write 
memory in addition to the ability to perform a com- 
pletely parallel search. In the “don’t care” condition 
where both of the bit line pairs are “0,” any of the 
digits to be completely masked off in this condition 
will not produce a mismatch signal in any word re- 
gardless of the contents of that bit in the word. Design 
of the basic cell which performs these various functions 
is discussed in the next section. 

Associative cell design 

The circuit diagram of the basic cell, designed to 
implement the aforementioned functions, is shown in 
Figure 2 and is seen to consist of 14 MOS devices, 10 
N channel and 4 P channel. The flip-flop portion con- 
sists of the cross-coupled CMOS inverters Ni, Pi, and 
N 2 , P 2 . To write a “1” into the cell, W and D x are r ised 
to + V 0 volts and D 2 remains at ground. This com- 
bination cuts off P 8 while the series combination of N 3 
and N 6 pulls the “0”’side of the flip-flop down toward 


TABLE I — Associative module system operation 


j 


FUNCTION 

WORD AND BIT LINE CONDITIONS 

RESPONSE 

Write 

Wi = “1”; Diy j 

© 

II 

Q 

1—1 

It 

Write “1” in ] th bit of i ,A word. 


Wi = «1 ,”'D m i “0,” D 2j = “1” 

Write “0” in ] th bit of i th word. 

Read 

Wi = “1” All D 

lines = “0” 

Non-destructive read of 1 th word— stored contents 
determined by presence or absence of I 8 on 
lines D 2 y. 

Paralie Search 

All Wi = “0” 

Dij = “1” 

Diy = “0” 

D,y = “0” 

D w = “0” 

D 2j = “1” 

D 2 y = “0” 

Search for “1” in } th bit. 

Search for “0” in ] th bit. 

Don’t care. 


Mismatch of any bit in word indicated by current 
on W lines. 
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Figure 1— Associative memory block diagram 



Figure 2 — Complementary MOS associative cell 


ground and after one stage delay the “1” side is up 
at + V 0 . Similarly, a “0" is written by raising W and 
D 2 to + V 0 with Di, at ground potential. Note that 
when both lines are grounded, and W is high, the state 
of the cell is unchanged. 

Non-destructive read out is obtained by again se- 
lecting W, thereby turning on transistor Ni 0 , and keep- 
ing all D lines at ground. Depending on the state of 
the cell, N 8 is either, on or off and a large or negligible 
small current is produced on the low impedance D 2 line. 

Mismatch detection by means of a parallel search is 
accomplished with all W lines grounded and placing 
the search criterion on each bit line pair, i.e., D x = + 
V 0 , D 2 = 0 for “1”; D x = 0, D 2 = + V 0 for “0” and 
Dx = D 2 = 0 for “don't care" or “0". Transistors 
N fl , N 7 , N s , and N 0 form the local exclusive OR circuit. 
If the stored information mismatches the information 
on the bit lines, one of the pair of N 6 -N 7 or N 8 -N 9 
will form a conducting path from the positive supply 


to the W line (at ground potential) and produce a large 
current (~1 mA) in the W line. Both pairs will be cut 
off if there is a match or if a “don't care" condition 
prevails in that bit location. Since all such circuits are 
OR'd together across the word, a match occurs only if 
all exclusive OR gates are open or a negligible small 
current appears on the W line. Any bit of the word mis- 
matching the search bit will generate a mismatch cur- 
rent for the entire word. 

It should be noted that read out and mismatch de- 
tection are both accomplished by current sensing in a 
low impedance line. This is an extremely high speed 
operation as the relatively large capacitance on the 
word and digit lines can be swamped by the low input 
resistance of a grounded base bipolar and the voltage 
conversion done at the relatively low capacitance col- 
lector and at essentially the same current level. (A 
complementary emitter follower performs more than 
adequately as a combination drive-sense circuit on 
both word and digit lines.) In high speed table look-up 
applications, such as “paging" in time shared comput- 
ers, fast parallel search and access is extremely 
desirable as this function is carried out once every main 
memory cycle. 

Another aspect of current sensing on the mismatch- 
line is that the magnitude of the mismatch current is 
directly proportional to the number of bits in error in 
that particular word. Utilization of analog detection 
circuitry on this line will then enable the determination 
of the word which most closely matches the search 
word, independent of the significance of the bit. The 
so-called “proximity match" condition is quite useful 
in many aspects of pattern recognition, for example, 
or other applications where incomplete information is 
available for the search criterion. 

Processing of silicon-on-sapphire COS /MOS 

Technical considerations 

Great difficulty has been experienced and reported 
by workers 2 attempting to build high quality, active 
silicon devices on sapphire substrates by the straight- 
forward application of standard bulk silicon technology 
to these heteroepitaxial films. These difficulties can 
be traced, in general, to two problems. 

The first is contamination from the substrate, epi- 
taxial system or handling procedures, and the second 
is due to disorder in the epitaxial layer caused by the 
growth interface. It can, therefore, be expected that 
devices and circuits fabricated in heteroepitaxial ma- 
terial must have the silicon processing adjusted in order 
to account for these deviations in properties. 
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TABLE II—] 

Physical characteristics of heteroepitaxial system components 


Silicon Si 

Sapphire A1 2 ,0 

Crystal Unit cell 

face-centered cubic 

r = 4.758 

(A) 

a = 5.4301 

a = 12.991 

Density 

2.33 

3.98 

(g/cc) 

Hardness 

7 

9 

(Mohs) 

Melting point 

1412 

2030 

(°C) 

Dielectric constant 

11.7 

9.4 (1 to C) 


(500 Hz - 30 MHz) 

(100 Hz - 100 kHz) 

Dissipation factor 

10-8 _ 10 -4 

io-» - 10- 4 

tan 8 




Refractive index 3.4976 1.7707 

at 1.357 m at 5461 A 


Thermal conductivity 

0.30 

0.065 (60° to C) 

cal/cmsec-°C 



at 25 °G 



Thermal expansion coefficient 

1/ °C(25 = 800°C) 

3159 X 10"« 

8.4 X 10"** 


Table II is a comparison of some of the physical 
characteristics of the components! of the heteroepitaxial 
system that must be taken into account if high quality 
silicon-on-sapphire devices are to be built. From this 
data, it is evident that some physical stress and dis- 
order due to the mismatch of these characteristics is 
inevitable. 

The effects of disorder and strain on the properties 
of bulk silicon are well known, e.g., base “push out” 
in bipolar transistors. Comparison of what is known to 
occur in bulk and what is observed in SOS yields some 
insight into the processing considerations. The most 
severe problems are : 

1. Accelerated Diffusion 

2. Accelerated Oxidation 

3. Contamination 

The change of diffusion coefficient in bulk silicon 


is a function of surface concentration and dislocation 
density. The distribution of disorder sites in SOS has 
been shown to be highest at the silicon to substrate 
interface and decreases as the thickness of the film in- 
creases. 8 Due to this distribution, there is a change in 
diffusion coefficient causing the impurities to move 
faster as they penetrate the film and therefore aterall 
diffusion can increase with depth, The resultant dif- 
fusion profiles are depicted in Figure 3. The following 
Figure 4 is a photograph of an actual “angle lap and 
stain” demonstrating the results of too long a diffusion 
of the source and drain regions. Note the resulting 
short of the source to the drain is at the silicon to 
sapphire interface. Because SOS has no bulk to dilute 
the fast diffusing contaminants plus the additional 
complication that the substrate can contribute to the 
contamination (Al, 0 2 , etc.) much greater care must 
be taken in handling and substrate preparation. This 
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Figure 3— Diffusion failures in bulk silicon and silicon- 
on-sapphire 



Figure 4 — Photomicrograph of diffusion failure in SOS 

consideration is further compounded by the affinity 
of contaminants for disorder sites. 


Finally, oxidation and its effects must be considered 
in the light of what is known to occur in bulk silicon, 
for this is the pillar on which silicon technology is 
built. Here, one finds three major effects. The first is 
dissolution of O 2 from the ambient and the generation 
of donor states with reported values on the order of 
10 18 /cc. The second is the precipitation of these im- 
purities causing large changes in mobility and, finally, 
segregation of impurities in the oxide. 4 

From the previous discussion, it is obvious, without 
going into the details of the phenomenon involved, 
that bulk silicon technology is not directly applicable 
to the fabrication of high quality complementary 
MOS devices on insulating substrates. The necessary 
alterations in the process involve elimination of 
oxidation where possible and minimization of the time 
that the wafer is exposed to high temperatures. In 
addition, advantage can be taken of the thin film na- 
ture of the technology by utilizing the “deep depletion” 
MOS structure, 5 thereby enabling construction of 
complementary devices in a film of common conduc- 
tivity type. 

Figure 5(a) depicts a wafer of silicon-on-sapphire with 
a 300°C deposited oxide defined by photolithographic 
techniques in the pattern to etch away that silicon 
not utilized by active devices. Figure 5(b) shows the 
pattern, left after the silicon is etched from the undesired 
region. 

After the desired pattern is. achieved in the silicon, 
thin layer of boron doped oxide [cross-tracked area 
Figure 5(c)] covered with pureSi0 4 is deposited (300°C) 
and etched into the desired P+ regions. This is followed 
by phosphorus doped oxide covered by pure SiO 4 and 
etched to define the N + regions as shown in Figure 5(d) . 

This structure has never been above 300° C and has 
the appropriate doped oxide source defined in the P + 
and N+ regions with the channel regions clean and free 
of oxide. The wafer is then subjected to its only high 
temperature treatment for the time required to drive 
in the diffusants and grow the channel oxide. This is 
indicated in Figure 5(e) with the appropriate diffusions 
drawn in. The final device structure (see Figure 6), 
including metallization, shows the built-up oxides at 
the edge of the active gate region minimizing the 
parasitic overlap capacitance. 

The metallization utilized to complete these struc- 
tures was evaporated aluminum and posed some prob- 
lems in continuity due to the relatively large silicon 
steps the metal was required to pass over (« 1 micron). 
Figure 7 is a scanning electron photograph of one such 
crossover. Note that the metal is thinner than the one 
micron of silicon and that the continuity appears sus- 
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Figure 5 — Low temperature CMOS Process 


pect. In fact, it was continuous. By increasing the 
thickness of the aluminum used to 15,000 A or 1.5 
microns, this problem was virtually eliminated. 

Unique features of SOS technology 

Several significant advantage^ result from the utili- 
zation of SOS in the areas of pirocess simplification as 
well as improved device and circuit performance. The 
use of the “deep depletion” MOS 6 eliminates the need 
for a difficult counter diffusion Ijo form complementary 
devices while selective silicon j removal restricts the 
critical silicon areas to the channel regions of the 
transistors since all metal interconnects are routed 
over the sapphire substrate. ! This gives complete 
freedom from metal to substrate shorts and spurious 
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Figure 6 — Final device structure 


At 



SILICON SAPPHIRE 


Figure 7 — Aluminum metallization over oxidized silicon 
edge 


channel formation, two major sources of yield reduction 
in monolithic MOS arrays. 

The most obvious advantage of this technology is the 
substantial improvement in circuit speed due mainly to 
the virtual elimination of all parasitic capacitances 
within the array. As shown in Figure 6, the through 
diffusion of the source and drain contacts to the 
sapphire reduces the contribution of the junction 
capacitance to the side-wall area, one dimension of 
which is only 1 micron thereby cutting this capacitance 





Silicon-on-Sapphire Complementary MOS Circuits 475 


by approximately two orders of magnitude over a bulk 
device of the same surface dimensions. 

Doped oxides as solid diffusion sources serve to fur- 
ther reduce parasitics in the form of gate overlap and 
crossover capacitances and all wiring capacitance is 
completely eliminated. Combined with the low threshold 
voltages (typically 0.5 volt enhancement for both 
device types) and the high field effect carrier mobilities 
of the transistors, the overall result is the realization of 
the full high frequency capability of MOS devices 
within an array environment. Inasmuch as the gain 
bandwidth product of the MOS is comparable to that 
of double diffused bipolar devices, circuit speeds ap- 
proaching those obtained with non-saturating bipolar 
logic gates (nano-second stage delays) can indeed be 
achieved with this technology while retaining the 
power and noise immunity features inherent in com- 
plementary MOS circuitry. 



t t t 

Du D 9I D, 


Integrated circuit design a/nd experimental results 

The fabrication of the associative array requires a 
total of five photo-masks each of which was generated 
with the aid of an automatic drafting machine. These 
masks define, in order of processing sequence, the 
isolated silicon islands, boron doped oxide, phosphorous 
doped oxide, contact opening and aluminum metaliza- 
tion patterns. Heavily doped N + silicon bars are used 
throughout as a first layer of interconnection. Exten- 
sion use of symmetry and mirror imaging was used in 
the layout as an effective means of reducing chip area. 
A photomicrograph of the completed silicon-on-sap- 
phire associative array is seen in Figure 8 with the 
bonding pads appropriately labeled. The chip has 
an active area of 77 X 53 mils, is packaged in a 14 
lead flat pack, and contains a total of 224 MOS devices. 
A test complementary inverter is included within the 
patern for initial wafer evaluation. Each transistor 
in the array (including the test units) has identical 
channel widths of two mils* lengths of 0.4 mils, and 
channel oxide thickness of 1800 A. Characteristics of 
typical test devices are shown in Figure 9. Based on 
these parameters and the assumed lateral diffusion 
of about 1 micron on both the source and drain regions, 
field effect mobilities of 150 cm 2 / volt second for holes 
and 300 cm 2 / volt second for electrons are obtained from 
the characteristics. 

Experiments conducted on fully packaged arrays 
show that a storage cell can be written into with a 
10 volt, 10 nano-second duration pulse with the ar- 
ray at the 10 volt supply level. Minimum sense cur- 
rent during read out is 1 mA as is the minimum 


Figure 8 — 16 bit SOS associative array 
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(a) N- CHANNEL TEST UNIT 
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(b) P-CHANNEL TEST UNIT 


Figure 9—' Test device characteristics 
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Figure 10 — Associative memory operation 


value of mismatch current. Associative memory opera- 
tion is best illustrated by referring to Figure ID which 
shows the contents of the memory as well as mismatch 
current waveforms generated for three different search 
criteria. The result of the first search for contents 
0001 correctly indicate a match in word two only. 
Note that the mismatch current in word four, which 
has two bits in error, is in excess of 2 mA while that 
in words one and three is only 1 mA corresponding 
to a single incorrect bit. The second and third photo- 
graphs again show correctly the proper mismatch 
waveforms for search criterion 0001 and 0001, the last 
of which correctly shows a match for all four words 
if the first two bits are ignored. 

The additional feature of “proximity” matching 
alluded to previously is shown more clearly in Figure 
11 where the mismatch output! is shown for zero, one 
two, three and four bits in error in a given word. Use 
of analog detection circuitry at; this point will greatly 
enhance the utility of resultant associative memory 
system. 

Although the work discussed here is of a research and 
Oevelopment nature and the volumes of arrays obtained 
are relatively small, it would be remiss at this point to 


2 mA / div 
10 nsec/div 


0,1,2,3,84 BITS IN ERROR 

Figure 11 — Analog mismatch signal 



avoid any discussion of yield, an all-important factor 
in integrated electronics. It is perhaps even more dif- 
ficult to discuss this area when tme considers the fact 
that in this new technology, a number of problem 
areas had to be overcome before any complex arrays 
were obtained. From that point on, however, the re- 
sults were extremely encouraging as yields of 30 to 
50 percent on packaged arrays were obtained ith 
extremely reproducible device characteristics. These 
represent relatively high yields when compared with 
monolithic MOS circuits of comparable complexity. 
It is believed, again with limited data, that these figures 
are a direct result of SOS technology wherein the 
amount of critical silicon is limited to the channel 
regions of the devices, and that yield depends only 
on this area rather than on total chip area as in mono- 
lithic circuits. In the 16-bit associative array, the critical 
area described represents 180 square mils whereas the 
total chip size is in excess of 5000 square mils, so that 
significant improvements in yield should and. do 
result. 

SUMMARY 

System, circuit, and device processing concepts have 
been developed and have resulted in the successful reali- 
zation of high performance silicon-on-sapphire associa- 
tive memory arrays. Features of the array include high 
speed current sensing for mismatch detection and non- 
destructive read out. The complementary MOS process 
sequence utilized in the array fabrication resulted in 
yields as high as 50 percent and produced high quality 
complementary devices with field effect mobilities 
of 300 and 150 cm 2 /volt-sec for electrons and holes, 
respectively. The drastic reduction of parasitic capaci- 
tance inherent in SOS technology combined with these 
device characteristics provides a performance level 
equivalent to the highest speed bipolar circuits while 
retaining all of the other desirable circuit and processing 
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features of MOS arrays. 

ACKNOWLEDGMENTS 

The authors gratefully acknowledge the contributions 
of D. J. Dumin and his associates in providing the 
silicon-on-sapphire films used in this work. A. O'Toole, 
J. Sokoloski apd Mrs. R. Gilchrist are responsible 
for the array processing while W. Salt and Mrs. B. 
Denton diced, mounted, and bonded the sapphire pel- 
lets. Thanks are due to G. Cullen and G. Gottlieb for 
supplying the material from which the low temperature 
processing sequence evolved. R. Powlus contributed 
to the initial phase of associative array design. J. 
Previte of Rome Air Development Center deserves 
considerable credit for recognizing the potential bene- 
fits of thin film active devices. 


REFERENCES 

1 J ALLISON J BURNS F HEIMAN 
Silicon-on-sapphire complementary MOS memory cells 
IEEE J Solid State Circuits Dec 1967 

2 C Y WRIGLEY L J KROKO 

Properties of the silicon-sapphire interface in heteroepitaxy 
Electrochemical Society Semiconductor Silicon Abstracts 
May 1969 

3 D DUMIN 

Deformation of and stress in epitaxial silicon films on single 
crystal sapphire 
J Appl Phys Vol 36 1965 2700 

4 E C ROSS G WARFIELD 

Effects of oxidation on electrical characteristics of silicon-on- 
sapphire 

J Appl Phys Vol 40 1969 2339 

5 F HEIMAN 

Thin film silicon-on-sapphire deep depletion MOS transistors 
IEEE Trans on Electron Devices Vol 13 1966 855 





A main frame semiconductor memory 
for fourth generation computers 

by THOMAS W. HART, JR., DURRELL W. HILLIS, 

JOHN MARLEY, ROBERT C. LUTZ and CHARLES R. HOFFMAN 

MOTOROLA, SPD 
Phoenix, Arizona 


INTRODUCTION 

It has been obvious for several years that Large Scale 
Integration could be applied to memories. Memories 
offer several advantages in that a large volume of one 
type of device can be manufactured, and that the de- 
sign can, be optimized for one application. There exists 
a wide spectrum of memory product areas with varying 
size, costs, speed and enviromental performance. Most 
of these application areas are presently serviced by 
various forms of magnetic storage. 

Semiconductor memories have been encroaching into 
some of these areas. First, the “scratchpad” was re- 
placed by semiconductor memories yielding a better 
performance at lower cost. Secondly, the small buffer 
memories are now being implemented by various forms 
of semiconductor storage, mainly by MOS shift regis- 
ters. Large very high speed semiconductor buffers are 
being built for large systems such as the IBM 360/85 
to effect a hardware performance increase of slower 
core main memories. 

It is felt that the advent of an all semiconductor 
main frame memory is fast approaching. The initial 
market penetration will be in the high performance 
area (100-300 ns) replacing flat-film memory designs 
where costs per bit are quite high. Eventually, most 
memory application areas will be vulnerable to semi- 
conductor implementation on a price and performance 
basis. This paper will describe a memory module which 
will be used as a building block to implement high 
performance memories in the next generation of com- 
puters. 


Module description 

Under various engineering and marketing con- 
straints, a module building block concept evolved. This 
module in its general form contains 8192 bits. Inter- 
face to and from the module is performed with standard 
current-mode logic levels. MECL levels were chosen 
because that logic family provides the fastest inter- 
face when connecting many modules into a large mem- 
ory system. Also, most of the customers and potential 
customers working on high speed systems are using some 
form of current-mode logic. In any event, it is not dif- 
ficult to interface from other logic families to MECL 
levels. 

By varying the logical connections to the module, an 
organization of 8192X1, 4096X2, 2048X4, or 1024X8, 
can be obtained. Figure 1 show's a block diagram of 
the module. Addressing is binary. Inputs and outputs 
may be bussed with other modules for expansion of 
the number of locations in a memory system. No com- 
plicated timing is necessary to operate the module. 
When an address is applied, the contents of the speci- 
fied address will appear at the output terminals within 
85 ns and remain until a new address is presented. 
Writing in a specified location is accomplished by 
pulsing the write enable line after the address and data 
have been presented. The module can be cycled every 
100 ns. 

The memory module uses p-channel MOS flip-flops 
for storage. Address decoding, word drive, sense, and 
digit drive are accomplished with bipolar circuits. 
This combination results in a low power, low cost 
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Figure 1 — Block diagram 8192 bit module 


memory array, while retaining high speed module 
performance because of the bipolar circuits. The mem- 
ory array itself contributes only a small fraction of the 
time used in a memory cycle (see timing diagram, 
Figure 14). The cycle-time is mainly determined by 
the bipolar circuits peripheral to the MOS — storage 
array. 

The memory module was designed to operate on 
±5v power supplies since these are fairly standard in 



8 BIT-LINE 
PAIRS 


integrated logic circuits. Total power dissipation is 
about six watts. While readily accomplished, no at- 
tempt was made to reduce power by various switching 
and pulse powering schemes since this level of power 
density can be easily handled in most applications by 
forced air cooling. 

Electrical description 

The module is a multipackage hybrid assembly. 
Four different integrated circuits are used to construct 
the module. These chips are (1) 256 bit MOS storage 
array, (2) Array Select Circuit, (3) Word Decode and 
Drive Circuit, and (4) Sense-Digit Circuit. The com- 
plete module has 32 Storage Arrays, four Array Select 
Circuits, two Word Drive Circuits, and four Sense- 
Digit Circuits. 

Storage array chip 

A block diagram of the 256 bit MOS Storage Ar- 
ray chip is shown in Figure 2. The array is organized 
in 2D fashion as 32 words X 8 bits. The linear 
select organization minimizes the number of devices per 
storage cell and also the number of inter-connections 
on the chip. Unfortunately, linear select organization 
causes some complications in packaging. These prob- 
lems are circumvented here by placing sense line 
switches on the same chip as the array. This provides 
two benefits. First, additional addressing can be per- 
formed with the sense switches improving decoder 



Figure 2 — Block diagram MOB storage array 


Figure 3 — Storage cell circuit schematic 
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efficiency. Second, the internal capacitance of the 
bit-lines can be isolated from the external bit lines by 
the sense switches, substantially improving the sense 
loop time constant. 

A schematic of a storage cell and the MOS sense- 
switches at the end of the bit line is shown in Figure 
3. Qi and Q 2 are the active devices of the flip-flop, R1 
and R2 are the flip-flop load devices, and Q 3 and Q 4 
are the series gating devices which connect a selected 


cell to the bit line pair. Each bit line has a transistor 
in series wit h the connection to the bonding pad 
and a transisto r Q EN which terminates the bit line 
to ground when is on. The geometries of the ac- 
tive devices are designed to provide a sense current of 
80 microamperes under worst case processing and 
operating conditions. The load resistor device geome- 
tries determine the standby power dissipation of the 
chip which in this case is about 40 milliwatts. 



Figure 4 — 256 Bit MOS storage array 
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Figure 4 is a photomicrograph of the chip. The 
dimensions of the chip are 138 mils X 141 mils. A 
low threshold process using <100> material is used. 
The substrate serves as the buss for the +5 volt supply. 

One layer of metal interconnection is used. A high 
concentration P-diffusion (15-20 ifi/square) Is used for 
crossunders so as to minimize series resistance. In the 
layout the bit lines have no crossunders. The word 
lines have nine crossunders. The resistance of these 
crossunders and capacitance associated with the word 
gates on the memory cell form an RC delay line. In 
this design the delay is about 2.5 ns. 

Chip selection circuit 

A bipolar circuit which decodes three binary bits is 
used to select one of eight MOS Storage Array Chips. 
Each of the output driver stages provides the compli- 
mentary signals EN and EN necessary to drive the 
sense switches on the MOS Storage Array. Additional 
inputs to the chip-selection Circuit are provided to 
select groups of eight arrays . 

Emitter Coupled Logic (ECL) input signals are 
translated to saturated logic which is referenced to 
the negative supply (—5.0). The complimentary out- 
put stages provide logic levels near the positive (+5) 
and negative ( — 5) supplies for driving the MOS sense 
switches. Block and Logic Diagrams are shown in 
Figures 5 and 6. 

Memory package 

Eight MOS Storage Array chips and one chip selec- 
tion circuit are contained in a 1.25 inch square memory 
package. Interconnection of these nine chips is made 
by a beam lead laminate as described later in this paper. 
Each memory package contains a total of 2048 bits 


TO ENABLE & ENABLE LINES OF MEMORY ARRAY CHIPS 



Figure 6 — Chip select circuit logic 


as shown in Figure 7. Four such packages form the 
storage portion of the 8192 bit memory module. This 
assembly of four packages results in a total capacitance 
buildup of 250 picofarads on the word lines and 70 
picofarads on the sense-digit lines. 

Decoding word driver 

Selection of the storage array word lines is accom- 
plished by a bipolar circuit which decodes four address 
bits and drives one out of sixteen word lines. As in the 
Chip Selection Circuit, ECL input signals are translated 
to saturated logic whose outputs provide logic levels 
near +5 and —5 volts. Block and logic diagrams are 
shown in Figures 8 and 9. Two of these chips are 
packaged in a 1.25 inch square package similar to the 
memory array package except that interconnection 
within the package is made with a thick film metali- 
zation and wire bonds. Two address enable inputs are 
provided. One is used as a master enable and the 
other is used as a one bit decode to select one or the 
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Figure 8 — Decoding word driver chip 
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Figure 9 — Decoding word driver chip logic 


other of two Decoding Word Driver chips sharing the 
same package. A block diagram of this package is 
shown in Figure 10. 

Sense amplifier-digit driver 

The sense amplifier-digit driver subassembly con- 
tains four identical sense amplifier-digit driver 
integrated circuit chips. Each chip receives and sends 
read and write signals to the MOS storage array, ac- 
cepts ECL level data input and data enable signals, 
and generates ECL data output signals. 

The purpose of each chip is, of course, twofold. 



First, when it has been properly enabled for writing, 
it must transmit a write signal to appropriate bit(s) 
of the selected word in the storage array. Second, when 
properly enabled, it must sense the storage cell cur- 
rents of the selected word and translate them to ECL 
signals at the data output terminals. 

The logic diagram shown in Figure 11 is functionally 
equivalent to the sense amplifier-digit driver circuit. 
In addition to showing the basic sense amplifier, digit 
driver, and gate blocks of the sense amplifier- digit 
driver chip, Figure 11 also shows the existence of a bit 
line recovery circuit. The purpose of this circuit is 
to rapidly return all bit line voltages to zero, immediate- 
ly after each write operation. 

To thoroughly understand the sense amplifier- 
digit driver logical organization, consider the sequence 
of events which must occur to perform the read and 
write operations. 

To accomplish a write operation, the desired input 
data is placed at the DATA IN terminals of the chip. 
The data is enabled by a coincidence of logical zeroes 
at the DATA ENABLE inputs. When the WRITE 
ENABLE input is forced to a logical zero, one of the 
bit line voltage drivers in each half of the circuit 
drives one line of each bit line pair to approximately 
+4v. This voltage impressed on a bit line accomplishes 
the write in the storage array. The leading negative 
edge of the WRITE ENABLE signal also sets the 
recovery circuit flip-flop. The following positive edge 
of the WRITE ENABLE signal turns the digit driver (s) 
off and turns the recovery circuit driver on. When 
recovery of all bit lines is accomplished, the recovery 
circuit flip-flop resets and the recovery circuit driver 
is shut off. Both the digit driver and the recovery cir- 
cuit driver are designed to exhibit a very high output 
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Figure 10 — Decoding word driver package 


Figure 11 — Sense amplifier/digit driver chip logic 
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impedance when off, such that Ithey do not interfere 
with the read operation. 

Reading is accomplished by enabling either one or 
both halves of the chip with the DATA ENABLE 
signals. If the WRITE ENABLE is held at logical 
one, the bit line currents flow irito the sense amplifier 
inputs. The sensed information | is made available at 
the DATA OUT terminals. Since the I/O signals are 
ECL, uncommitted emitter outputs are used so that 
wired OR'ing of the positive going output signals is 
possible. 

Figure 12 shows a block diagram of the sense ampli- 
fier-digit driver package. Since the DATA OUT 
signals from all four sense amplifier-digit driver chips 
can be OR’ed, various connections of the DATA EN- 
ABLE and DATA IN signals j are possible. If the 
DATA ENABLES are connected for maximum de- 
coding, a one-out-of-eight selection can be accom- 
plished. With all eight DATA IN inputs and DATA 
OL T T outputs strapped together, the module organi- 
zation becomes 8192 words of ope bit. Similarly, if all 
DATA ENABLES are tied together, each DATA 
IN and DATA OUT is used as a separate information 
channel, and the resultant mpdule organization is 


8 SENSE DIGIT PAIRS 



Figure 12— Sense-digit package 


1024 words of eight bits. Other connections result in 
4096 words of two bits, and 2048 words of four bits. 
These various connections occur external to the module. 
Hence, the sense amplifier-digit driver plane organi- 
zation is the same regardless of the final module organ- 
ization desired. 

Module electrical organization 
Figure 13 shows an integrated electrical schematic 



Figure 13 — 8192 bit memory stack-electrical organization 
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of the 8192 bit memory module. Notice that the pack- 
age-to-package connections are accomplished by means 
of long parallel busses or rails which are formed when 
the six packages are stacked. Although these rails 
must be broken in some places to define the inter- 
connection, no. jumper wires, etc., need be added. 

There are three distinct methods of addressing 
shown in Figure 13 : 

1. Address bits AO, Al, A2, and A3 are bussed in 
parallel to the two decoding word driver chips (DWD). 
Since each DWD is a one out of sixteen decoder, an 
additional address bit (A4-A4) is decoded on the 
enable inputs of the DWD’s. Hence, the decoding 
word driver package functions as a one out of thirty- 
two decoder, selecting only one of 32 word select lines 
to enable in the memory array. Note that all word 
select lines in the thirty-two MOS array chips are 
wired in parallel. 

Address bits A5, A6, A7 drive the chip select circuits 
(CSC) in the memory array packages. The CSC drives 
the ENABLE and ENABLE of memory array’s sense 
line switch, and select one of the eight array chips 
with each memory array package. To complete the 
selection of an eight bit word on one MOS array one 
of the four memory planes must be uniquely enabled. 
The two enable inputs of each chip select circuit, 
(CSC) are connected to address bit A8 and A9 or 
their complements to perform the plane selection. 
Connection to the correct two signals is accomplished 
by providing A8, A8, A9, and A9 at the edge of all 
memory array packages and breaking the bus con- 
nection with two of the signals. 

These ten bits (A0-A9) select one eight bit word 
out of the 1024 word array. Subdivision of the eight 
bit word into four, two, or one bit words is accomplished 
by strapping lines (DEI -DEG) together in specific 
combinations. The sense amplifier package can accept 
zero, one, two, or three address bits depending on the 
module organization desired. 

Once a single eight bit word in the storage array has 
been addressed, sixteen low impedance paths (eight 
pairs of bit lines) exist between the four sense ampli- 
fier-digit driver chips and the eight MOS array cells. 
Reading is accomplished by sensing the storage cell 
currents while holding the bit lines at approximately 
ground; writing by forcing selected bit lines to a posi- 
tive voltage. 

In addition to the standard I/O and address chan- 
nels, a DWD ENABLE and/or CSC ENABLE are 
brought out to allow for further addressing or for 
eliminating skew in the address signals. 


Module timing 

Figure 14 is a timing diagram of the 8192 bit module. 
The diagram is organized into three sections: (l) the 
basic addressing and enabling common to both read 
and write cycles. (2) the basic read cycle, and (3) the 
basic write cycle. The diagram illustrates how the 
various propagation and charging delays add to form 
the minimum cycle times. 

Since all word select and bit lines are parallel con- 
nected to all thirty-two array chips, the capacitances 
associated with these lines are high (approximately 
250 pf on word lines, 70 pf on bit lines). Therefore, 
charging times become a significant portion of the 
memory cycle time. 

The labels on the diagram are generally self ex- 
planatory. The comments below explain some of the 
special features shown. 

Notice that if desired, the sense amplifier-digit 
driver data enable inputs (DE1-DE6) can arrive 
at the module terminals some 15 ns later than the ad- 
dress signal (A0-A9). Therefore, additional levels of 
decoding logic may be added in series with these in- 
puts without slowing the cycle time. 

During the read cycle the sensed information must 
be strobed out toward the end of the cycle, as the in- 
formation on the sense amplifier-digit driver DATA 
OUTPUTS prior to that time is the stored information 
in the previously addressed location or undefined. Bit 
line recovery after writing is overlapped into the next 
cycle. 

Packaging 

The 8192 Bit Memory Module is an assembly of 
four 2048 Bit Memory Array packages, a word driver 
package, and a sense digit package. Each of these 
packages are 1.25 inches square with 17 leads on 50 
mil centers on each side. The electrical organization 
is such that the packages can be stacked one above the 
other with the leads bussed. After assembly and test 
of the individual packages are complete, the individual 
leads are cut and formed. The packages are then 
placed in a mechanical holder, and wave soldered one 
side at a time. A molded header is then mechanically 
and electrically attached to the base of the stacked 
assembly of packages, The header includes guide 
slots so that the ends of the ribbon leads can be in- 
serted into a simple etched wiring board nested within 
the header. The 42 signal pins of the header are ar- 
ranged in a 1.6 inch square on 100 mil centers. This 
choice of pin form factor permits established printed 
circuit board technology to be employed by the user. 
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Figure 14 — Timing diagram - 8192 bit MOS stack memory 


The heat generated by this module is approximately 
6 watts. The volume of the module assembly in- 
cluding the 42 pin header and plug isU75Xl.75X 
2.0 inches or six cubic inches. Operational bit density 
is thus over 1300 bits per cubic inch. It is felt that these 
two counteracting factors are fairly well balanced to 
each other by this module design concept. 


Package interconnect 

The Sense-Digit and Word Driver packages are 
simply one layer thick film metal patterned ceramic 
packages with connections being made with flying 
wire leads. The memory array package is much more 
densely populated and uses a nevf technology of inter- 
connect. 

Interconnect technologies currently available are 
multilayer ceramic, multi-layer surface deposition, or 
multi-layer “add on” laminate, j Ceramic multi-layer 
was not selected for three reasons. The high dielectric 
constant of alumina raises the distributed capacitance 
to levels which threaten system speed requirements. 
The many vias required for intra-layer connection 
cannot be placed on close enough centers to be compati- 
ble with the desired cell densities Finally, the length 
of buried conductors used for power distribution ex- 
hibit higher resistance than is desired for low noise 
level operation. 


Use of multi-layer surface depositions on a suitable 
package substrate has been avoided because of yield 
problems of dielectric defects in the presence of many 
crossovers, and the presence of deposition interfaces 
at each via buildup location. Surface deposition of 
a single low-impedance thick film pattern on the package 
substrate has been utilized for power distribution to 
the IC chips within the container. 

Interconnect laminate 

The interconnection is implemented by a separate 
part called an interconnect laminate. The dielectric 
core of this laminate is 1 mil polyimide film and exhibits 
the following characteristics : 

1. Physically and electrically stable dielectric 
through the range of — to +450°C. 

2. Dielectric constant of 3.5. 

3. Pinhole free and a voltage breakdown rating 
of 7000 volts at one mil thickness. 

Through a series of precisely-registered artwork and 
photo-chemical cycles, the two-layer X-Y intercon- 
nections are formed to the following standards : 

1. Via size of 1.5 to 2.0 mils diameter on seven mil 
centers. 

2. Via lands or caps are 4 X 5 mils. 

3. Conductor widths are three mils and conductors 
are spaced on five mil centers. 
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4. Conductors on both sides and the vias are 
electroplated as a single structure having no 
metallurgical interfaces at the vias. 

These standards of fabrication provide an intercon- 
nect system which is compact and comparable in 
geometry details to the bond pads and spacing used by 
MOTOROLA on its IC chip products. Capacitance 
measurements of typical center conductors to the sum 
of grounded neighboring conductors, using the above 
dimensional and material standards, read about 2.15 
picofarads per inch of length while dc resistance of 
typical conductors measures 0,40 ohms per inch of 
length. 


Of the three primary methods for connecting signal 
lines to the IC chips pads (wires, bumps, and beam 
leads), a system employing beam leads is used. These 
beams are integrated into the laminate plating rather 
than using the more customary method of integrating 
the beams into the IC chip. This was done to avoid ad- 
ditional processing steps to the already complex wafer. 
Not only is the silicon wafer yield protected, but a 
packaging thermal advantage is obtained by being 
able to beamlead bond “face up” against the IC chips. 
The rear surface of the chips is then mechanically 
secured to the substrate base, assuring low thermal 
resistance. 

A cross-section diagram showing the features of the 
inter-connect laminate, the ultrasonic “face up” 


bonding technique, and the heat sinking capability to 
the power-carrying cermet metalized alumina sub- 
strate are illustrated in Figure 15. The face-up technique 
permits bonding to the chip one beam lead at a time 
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Figure 16 — Packaging for memory stack 
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or one chip at a time. It also permits quite stringent 
quality control measures to be implemented since the 
beams can be examined individually ■ 

Memory package 

A sketch of the package which is being used is il- 
lustrated in Figure 16. It consists oi a 1.28 inch square, 
96 percent alumina base, which is metallized to a cus- 
tom pattern containing 68 metal film leads which go 
under a glass-sealed side wall. The base of the usable 
1.0 square inch interior contains the power distribu- 
tion pattern. The headroom within the package is 60 
mils. 

As can be seen by inspection of the figure, the area 
occupied by the IC memory chips and the control 
chip is approximately 25 percent of the area, the re- 
maining area being used for the X-Y interconnect and 
exit bond functions, 

In the assembly cycle, a total of 448 beams leads 
are bonded to the IC chips which is half of the bonds 
required by wire bonding techniques. The laminate 


contains 480 electrically active plated feed throughs. 
Larger beam leads are employed to connect the inter- 
connection laminate to the exit bond pads and the 
power distribution. A total of 73 such bonds are re- 
quired. In the computer program which generated the 
interconnect laminate artwork master sets, approxi- 
mately 1400 conductor track segments instructions 
were generated. The cover is alloyed to the package 
subassembly after precap testing. The result is a 
memory component containing 2048 MOS memory 
cells and having only 68 leads to the outside world. 

CONCLUSION 

A high performance memory module has been described 
which is suitable for use as a building block for large 
mainframe memories. Mass production of this memory 
module is planned. Costs per bit of a memory system 
using these modules as basic building blocks will be 
much lower than that of other technologies giving a 
similar performance. In the near future the competi- 
tive pressure of semiconductor memories will be felt in 
most performance ranges. Magnetics watch out! 



A new approach to memory and logic- 
cylindrical domain devices 


by A. H. BOBECK, R. F. FISCHER and 
A. J. PERNESKI 

Bell Telephone Laboratories 
Murray Hill, New Jersey 

INTRODUCTION 

Magnetic domain behavior in single crystal magnetic 
oxides has been studied extensively over the last 
several decades. These investigations, both theoretical 
and experimental, are an attempt to better understand 
these materials and their complex domain structures. 
Recently single crystal oxides have been utilized in 
memory and logic devices. This paper will update 
work on cylindrical domains in orthoferrites first 
published in 1967 and later discussed at the 1968 and 
1969 Intermag Conferences. 1 * 2 ’ 3 

A cylindrical domain, sometimes referred to as a 
bubble, is a localized high energy magnetic state. 
Such a domain is stable and resists any attempt to 
deform it. Domains can be moved about in much the 
same way as a charged particle. A domain can be 
moved one domain diameter in less than 100 nano- 
seconds thus indicating that data rates in excess of 
10® bits/sec can be realized in this technology. As yet 
no upper limit to the cylindrical domain velocity has 
been found experimentally. 

Sucessful device utilization of cylindrical domains 
depends upon developing techniques for generating 
propagating, interacting and detecting these domains. 
Domains can be generated by sectioning an existing 
domain into halves. Each new domain can be con- 
sidered as an information input if the splitting operation 
is selectively controlled. A stream of domains, , fed into 
a propagation channel and transmitted to an output 
point, can be detected by optical, Hall or induced 
voltage readout. Although all these readout techniques 


have been studied only induced voltage readout will 
be detailed in this paper. 

A new class of materials, the ortho ferrites, 4 ’ 6 are 
now available which, in addition to supporting cylin- 
drical domains at densities approaching 10 6 per square 
inch, have the combined properties of high nucleation 
fields (so domains will not spontaneously appear), low 
domain wall coercivities, and high domain wall mo- 
bilities. A description of the general properties of 
cylindrical domains 6 - 7 in orthoferrites is followed by 
a section on the behavior of domains in gradient 
fields. Conductor circuits, “angelfish” circuits 8 and 
in-plane rotating field circuits 9 are presented as general 
methods to propagate domains. Finally the relevance 
of domain wall devices to the computing field is dis- 
cussed. 

General observations 

If we take a thin platelet of orthoferrite above its 
N£el temperature and cool it to room temperature 
spontaneously nucleated serpentine-like strip domains 
will be present. Such a domain pattern, as seen in 
Figure 1, will usually include a number of single wall 
domains. A single wall domain can be identified by 
noting whether the wall which bounds it closes upon 
itself. If a prescribed magnetic field, the bias field, 
is applied normal to the surface of the platelet the 
single wall domains become cylindrical. An array of 
such domains is shown in Figure 2. The 1.7 mil thick 
platelet of Sm. 66 Tb. 45 FeO s osoferrite is subjected to 
a 42 Oe bias field. 
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Figure 1 — Strip domains, 1.5 mils in width, in a 1.7 mil thick 
platelet of Sm^Tb^FeOs orthoferrite viewed by Faraday effect. 
Note the single wall domains. Bias field is zero 


Figure 2 — With a 42 Oe bias field the single wall strip 
domains of Figure 1 become cylinders each 1.8 mils 
in diameter 


Those familiar with the earlier references recall 
that cylindrical domains are stable over a limited 
range of the bias field (typically^ 10 percent of 4 ttM s ). 
An excess bias causes the domain to collapse inward. 
On the other hand as the bias is decreased the domains 
grow in size eventually reaching a diameter at which 
they become unstable to elliptical perturbations and 
then suddenly grow into long striji domains. 

A strip domain can also be cut by energizing a 
conductor positioned in contact 1 with the orthoferrite 
and intercepting the strip domain at right angles. 
For SmTb orthoferrite a current of 300 mA is suf- 
ficient. Later, in the discussions of conductor propa- 
gating circuits, a technique for splitting cylindrical 
domains will be presented. 

Manipulation of cylindrical domains — General 

Domains in orthoferrites are maintained in the 
preferred cylindrical form by an overall uniform bias 
field applied normal to the platelet surface. As dis- 


cussed previously an increase in the bias field decreases 
the domain diameter and vice versa. Now consider 
the reaction of a cylindrical domain subjected to a 
nonuniform rather than a uniform field. The response 
will be complex and could involve a change in size, 
motion at a nonuniform rate or even the collapse of 
a domain. However, it is possible to treat the case 
in which a uniform gradient field is applied. 

Consider, as shown in Figure 3, a cylindrical domain 
of diameter 2r in a uniform gradient field. The domain 

Hq Hq+AH 


Hq+AH 


UNIFORM GRADIENT 
FIELD 

Figure 3 — A cylindrical domain of diameter 2r positioned 
in a uniform gradient field 
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will experience a force attempting to move it toward 
a position of reduced bias. To overcome the wall 
coercivity, H c , the following condition must be met: 


AH>8H c /tt. (1) 

1 

Furthermore, it can also be shown that the domain wal 
velocity, /, is given by 

/(cm/sec) = AH(Oe) fx(cm/aoQ/Oe)/2 (2) 

where is the usual domain wall mobility. 6 

One method to see the effect of a gradient field is to 
interact one domain with another. In the case of 
domains widely separated the far field of a cylindrical 
domain can be approximated as that of a dipole and the 
following relationship derived (see Figure 4). 


H c 37rroh 

47rM s 8A 4 2 


(3) 


Equation (3) specified^, the minimum stable separa- 
tion between a pair of domains as they repel one an- 
other because of their mutual gradient fields. 

Finally it has been found useful to interact high 
permeability magnetic film patterns with cylindrical 
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Figure 4 — Two domains, mutually repelled in a 
material whose coercive force is H c , reach a 
stable separation l \ 2 . 



Figure 5 — Interaction between a matrix of high 
permeability disks and a cylindrical domain 


domains. Consider, for example, a matrix of permalloy 
dots positioned on the surface of an orthoferrite plate- 
let. One finds, by experiment, that a cylindrical domain 
prefers a position in contact with the permalloy as 
shown in Figure 5. The permalloy dot diameters and 
separations have been chosen to be consistent with 
the stable cylindrical domain size in the orthoferrite 
under study. The dots serve as localized flux closure 
paths thereby reducing the magnetostatic energy. 
They provide a shift register, a memory array, , etc., 
with well defined domain positions. 

Conductor circuits 

In order to utilize cylindrical domains in shift 
registers, memories and logic circuits, we require 
motion in discrete steps at specific times. Therefore, 
highly localized fields are needed. Such fields can be 
produced by small conductive loops placed flat on a 
platelet surface. Since thin film techniques are used 
to fabricate the conductor circuits, a completely closed 
loop is not practical. 

Figure 6 illustrates the basic conductive loop con- 
figuration and the resulting field profiles. These were 
obtained by measuring the fields produced by an 
expanded scale replica of the thin film circuits. The 
circuit dimensions were chosen to provide controlled 
motion of domains whose diameters range from 3.5 to 
6 mils. In order for a domain to move to an adjacent 
loop it must initially be in contact with some portion 
of the positive gradient field produced by that loop. 
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Figure 6 — Conductor circuit ujsed to propagate 
cylindrical domains and the resulting field 
profiles for 200 mA applied current 

This puts a lower limit on the domain size. The limit 
of maximum domain size is reached when a disparity 
of domain to applied field area results in reduced con- 
trol of the domain position. 

The most important feature of the semiclosed con- 
ductive loop circuit is that the field is confined to 
an area consistent with that of a domain. Therefore, 
the field may far exceed the value which would trans- 
form a domain from a cylinder to a strip. The upper 
limit on this field, however, is that value which would 
stretch the cylindrical domain into the strip area defined 
by the connections between the loops. 

The limits of the applied drive and bias fields are 
illustrated in Figure 7. The data was obtained using 
a 2.0 mil thick platelet of YbFe0 8 operated in a quasi- 
static fashion on a conductor pattern similar to that of 
Figure 10. The operating contour resides within the 
bias field extremes required to maintain a cylindrical 
domain. The position and size of jthe operating contour 
within the bias field boundaries is determined primarily 
by the range of domain sizes which the circuit can 
accommodate. 

In Figure 8, velocity curves are given of domains 
in YFe0 3 , TmFe0 3 and YbFeOj platelets. These are 
functional measurements obtained using the circuit 
of Figure 6. Rossol has shown that YFe0 3 exhibits an 
extraordinarily high mobility. 10 Functional velocity 
measurements of YFe0 3 have confirmed this. Data 
rates in excess of 3 X 10 6 bits/sec have been reached. 
A direct comparison of device sjpeed and domain wall 
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Figure 8 — Functional velocity curves of YFeO*, TbFeO* 
and TmFeOa platelets. 

mobility cannot be made because of the complex nature 
of the field profile. Notice that threshold currents as 
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Figure 9 — Thin film conductor pattern for two dimen- 
sional propagation of cylindrical domains. Conductor 
dimensions identical with that of Figure 6. 


OUTPUT 



low as 10 mA have been measured representing drive 
fields less than 1.0 Oe. 

A conductor pattern is shown in Figure 9. Note 
that the series of loops are interconnected such that 
there are three separate interleaved circuits. Thus, 
with a three phase system, a domain at position A can 
be propagated to C with the sequential application of 
currents Iyi, Iy 2 and Iy 3 . Two dimensional propagation 
can be performed by simply aligning two identical 
circuits orthogonal to each other. The domain at po- 
sition A can now be propagated to B with currents 
Ixi, Ix 2 and Ix 3 . Bidirectional propagation merely 
requires a reversal in the three phase sequence. The 
domains (bits) are spaced on 10.5 mil centers or every 
third propagate loop. This is adequate spacing to 
avoid interactions in materials having a coercive 
force of 0.25 Oe or higher such as YbFe0 3 . The resulting 
packing density is over 6 X 10 3 bits/in 2 . 

Figure 10 is a photograph of a unidirectional shift 
register circuit. The register is equipped with an input 
and output circuit. Information is written by controlled 
domain replication and the output circuit detects a 
change in flux. The circuit is operated with a biphase 
propagating source. Directionality is achieved with 
the help of permalloy dots. The dots, which provide 
low energy sites for the domains, are uniformly shifted 
with respect to the conductive loops. This asymmetry 
places the domains in a consistent, preferred position 
prior to each propagate phase. The permalloy in es- 
sence provides a five Oe third phase drive. The permalloy 


Figure 10 — Photograph of the conductor pattern of a undirec- 
tional shift register utilizing a biphase propagating source. The 
circuit contains a controlled replicate input and an output circuit 
which detects a change in flux. Circuit is capable of propagating 
domains having a nominal diameter of 4 mils 


dots are, typically, 4000 A thick, one mil diameter 
and spaced on four mil centers along the propagating 
track. They are deposited on pedestals, fabricated 
as part of the conductor circuit. This is done to ensure 
that the permalloy is in intimate contact with the 
orthoferrite. The biphase register design provides a means 
of constructing long serial registers without necessi- 
tating conductor crossovers. With a biphase system, 
however, the packing density of domains is about one 
fourth the propagate positions rather than one third, 
as in the case of the three phase system. In addition, 
speed is reduced by virtue of the limit of the pseudo- 
drive provided by the permalloy. 

A suitable material for use with the device is TmFe0 3 . 
A platelet two mils thick, exhibiting domains three to 
five mils in diameter was used. Operation is initiated 
by placing a “source” domain in the starting loop. To 
insert a bit, the larger loop encompassing the replica- 
tion (hairpin-like) conductor is energized, centering 
the source domain over the replication conductor. 
After the domain is split, one section is returned to 
the start position and the other is simultaneously 
shifted two loop positions to the start of the register. 
The domain is shifted through the register until it 
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reaches the output circuit. The two outer conductor 
loops are part of the readout drive circuit while the 
two inner loops comprise the sen^e circuit. The read- 
out drive loop nearer the domain is energized drawing 
the domain into the loop and then expanding it to 
the extent of the loop. The domain is then collapsed 
by a reversed drive through both j loops. The resulting 
flux change is detected on one 'sense loop and the 
induced voltage due to di/dt is cancelled with the 
other. The domain is expanded to; an area forty times 
the area of the cylindrical domain and provides an 
output of 1.0 mV-/*sec. A photograph of the output 
waveform is shown in Figure 11 j Notice the bipolar 
nature of the waveform. The output circuit has been 
shaped to not only increase the 1 area of the output 
domain but also to maximize thp rate of change of 
flux linkages during the collapse phase. 

The circuit has been operated at speeds in excess of 
10 6 bits/sec using 350 mA propagate currents. The 
minimum replicate drive pulse is 750 mA, 1 /nsec wide. 
The nominal readout drives foij domain expansion 
and collapse are 530 mA and 700 mA, respectively. 

“ Angelfish ” circuits 

We have progressed through three phase conductor 
circuits where the propagation direction is determined 
by the sequence in which current pulses are applied 
and two phase conductor-permalloy circuits where 
the propagation d'recton is built in by a nonsymmetric 
conductor-permalloy alignment. A logical progression 
is the possibility of an all permalloy circuit to interact 
with, and thereby propagate, domains in orthoferrite. 
There are, in fact, two such general classes of circuits 
and they will be discussed in this and the following 
section. 

The first class, coined the “angelfish” circuits, 
utilize the fact that a cylindrical domain can be modu- 



Figure 11 — Photograph of the outppt waveform from 
circuit shown in Figure 16. Horizontal scale is 1 
Msec/div ; vertical scale is 2 mV/div 



Figure 12 — Domain positioned on a wedge-shaped high 
permeability permalloy thin film. The domain is more 
easily moved off the point of the wedge (a) 
than the blunt edge (b) 


lated in size by increasing or decreasing the bias field. 
Motion is achieved by maneuvering this pulsating 
cylindrical domain in and out of asymmetrical energy 
traps. The traps are created by wedge shaped films of 
high permeability permalloy placed in contact with 
the orthoferrite platelet. 

The interaction which exists between a cylindrical 
domain and a wedge is illustrated in Figure 12. The 
domains assume a position on a wedge where the 
magnetostatic energy is minimized. It was confirmed 
by experiment that from this position a domain is 
more easily moved off the point (a) rather than the 
blunt end (b). The mechanical analogy is that it is 
easier to walk up a ramp than to scale a wall. A shift 
register can be built which propagates domains along 
a series of wedges by means of a periodic modulation 
of the diameter of the domains. During the expansion 
phase the leading domain wall reaches out to latch 
onto the blunt edge of the next wedge and during the 
contraction phase the trailing domain wall slides off 
the point of the wedge that held it. This pushing and 
pulling action provides the unidirectional motion 
desired. 

An experimental 32-step shift register, shown in 
Figure 13, propagates domains continuously around 
a circle. The permalloy circuit is photoetched from a 
4000 A permalloy film. The size can be estimated by 
noting that the outer ring is 50 mils in diameter. The 
inner and outer permalloy rings provide lateral sta- 
bility to the domain as it travels. Lateral stability is 
not required because of any inertia associated with 
the domain, but to ensure that the domain will expand 
and contract along the direction of motion rather than 
across. Operation is obtained as the bias field is oscil- 
lated between the extremes of 38 to 44 Oe. The ortho- 
ferrite used was a 2.3 mil thick platelet of Tb 0 6 Tm 0 5 
Fe0 8 . 
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(c) 



Figure 13 — A section of a 32-step unidirectional ring 
“angelfish” register. The bias field is 38 Oe (a), 

44 Oe (b), 38 Oe (c), 44 Oe (d). Motion is 
counterclockwise 


Propagation by u T-BAR ” permalloy circuits 

In a second method of propagation an in-plane 
rotating field acting on a structured permalloy pat- 
tern generates traveling positive and negative mag- 
netic poles to selectively attract and repel and thereby 
control the motion of a cylindrical domain. A variety 
of permalloy patterns are suitable and one such pat- 
tern, the T-BAR, is illustrated in Figure 14. The 
name, T-BAR is, of course, identified with the high 
permeability thin film permalloy pattern shown in 
contact with the upper surface of an orthoferrite 
platelet. 

The operation of this circuit will be most readily 
understood after a study of Figures 14 and 15. First 
the bias field is adjusted to maintain a stable cylin- 



BIAS 

FIELD 


Figure 14 — Isometric view of permalloy T-BAR 
pattern in contact with surface of orthoferrite 
platelet. Rotating in-plane field generates poles 
which cause the domain to move 

drical domain. Next assume that a field is applied in 
the plane of the orthoferrite and directed as illustrated 
in Figure 14. This in-plane field, which has very little 
direct effect on the orthoferrite, produces magnetic 
poles in the structured permalloy circuit thereby pro- 
viding the cylindrical domain with the low energy rest 
position shown. Clockwise rotation of the in-plane 
field causes a systematic redistribution of the magnetic 
poles in the permalloy and the domain responds by 
moving from left to right as photographed in Figure 
15(a)-15(e). With each rotation of the field the do- 
main advances one period of the circuit. The propa- 
gation direction may be reversed by rotating the field 
in the counterclockwise sense. 

Figure 16 shows a typical domain generator. The 
entrance to the T-BAR propagating channel is from 
the left if the field is rotating clockwise. The large 
generator disk at the entrance maintains a domain 
which stays in contact with the + poles formed on the 
disk by a rotating transverse magnetic field. As the 
field rotates to the position shown in Figure 16a, the 
domain is forced to pass over the first + pole formed 
at the left end of the propagating channel. When the 
field rotates another quarter cycle, Figure 16b, one 
end of the domain becomes attached to the advancing 
+ poles of the propagating channel while the other 
remains attached to the + poles of the disk. As the 
field rotates further, Figure 16c, the two ends of the 
domain are forced to travel in opposite directions, and 
a negative pole distribution begins to build up near the 
center of the stretching domain, forcing it away from 
the disk. When the negative pole distribution is maxi- 
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Figure 15 — Sequence of photographs showing a 2 mil 
diameter domain propagating as the field rotates 
clockwise through 360° 


mum near the stretched portion of ithe domain, Figure 
16d, the field from the disk shrinks that portion of 
the domain width until it becomes unstable and the 
domain suddenly ruptures into two portions, one 
remaining on the disk and the other remaining in the 
propagation channel. Both domains then return to a 



Figure 16 — Domain generation — A permanent domain 
associated with the rotating + pole configuration of the generat- 
or disk is forced to stretch when one end becomes trapped in 
the T-RAR propagate channel. When the in-plane rotating 
field H R is directed upward, the - poles near the stretched 
portion ol the domain cause it to sever into two, leaving a newly 
formed domain in the propagate channel 


domain size determined by the bias field with the result 
shown in Figure I6e. 

In general the minimum transverse field required for 
domain generation is larger than the minimum field 
for propagation; therefore, insertion of domains 
into a single channel can be controlled by increasing 
the amplitude of the rotating transverse field for either 
an entire cycle or for only that portion of the cycle 
(approximately cycle) where the domain becomes 
stretched to its maximum. Insertion of information in 
multichannel devices (say up to ten channels) can 
be controlled by designing the geometry of the gen- 
erators so that either the amplitude of the rotating 
field, or the portion of the cycle it must be increased, 
or both, is different for different channels. 

An example of domain generation uses a magnetic 
overlay made from 8900 A isotropic permalloy. The 
T-BAR propagation channel has the same dimensions 
as previously stated and the generator disk is 9 
mils in diameter with a 2.5 mil protrusion into the 
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TABLE I 


Experimental Calculated 


Rare Earth 

itim 3 


(mils) 

2r 

(Oe) 

field 

(mils) 
Thick, h 

(mils) 

*d 

(ergs/c 

Y 

105 

8.4 

3.0 

33 

3.0 

2.5 

1.8 

La 

83 

6.6 


Not Available 



Pr 

71 

5.7 


Not Available 



Nd 

62 

4.9 

7.5 

3.2 

2.0 

4.4 

1.1 

Sm 

84 

6.7 

6.0 

3.0 

1.1 

2.9 

1.3 

Eu 

83 

6.6 

5.5 

10.5 

2.0 

3.7 

1.6 

Gd 

94 

7.5 

3.7 

16 

2.4 

2.9 

1.7 

Tb 

137 

10.9 

1.7 

51 

2.2 

1.4 

1.7 

Dy 

128 

10.2 

2.0 

32 

1.6 

1.7 

1.8 

Ho 

91 

7.3 

4.5 

12 

2.1 

3.3 

1.7 

Er 

81 

6.5 

6.0 

8 

2.0 

3.9 

1.6 

Tm 

140 

11.2 

2.3 

37 

2.3 

1.9 

2.4 

Yb 

143 

11.4 

3.8 

41 

3.0 

3.0 

3.9 

Lu 

119 

9.5 

7.5 

10.5 

2.0 

4.3 

3.9 

Sm 0.6 Er 0.4 

83 

6.6 

1.0 

33 

1.8 

0.80 

0.35 

s,n 0.55 Tb 0.lt5 

108 

8.6 

0.75 

61 

2.0 

o.4o 

0.30 



propagate channel. The orthoferrite is a 2 mil 
thick platelet of Sm. 65 Tb 45 Fe0 3 with 4wM s = 108 
gauss. The bias field is 42 Oe producing approximately 
1.5 mil diameter domains. The transverse field ampli- 
tude necessary to generate domains is 20 Oe peak 
while 10 Oe peak is sufficient to propagate domains. 

Domain logic 

Logic can be performed in cylindrical domain de- 
vices by utilizing the repelling forces between domains. 
T-BAR-like overlays are used to transport domains 
close enough to allow the interactions to occur. An 
overlay arrangement particularly useful for performing 
logic functions is that of an idler position into which 
a domain can be inserted and forced to circulate within 
a relatively fixed position as the transverse field rotates. 

An example of domain logic uses the permalloy 
overlay of Figure 17. A logic variable N is determined 
by the presence or absence of a domain circulating in 
the idler position formed by the four bars which pro- 
vide the pole positions four, five, six, seven. The input 
variable X is determined by the presence or absence 
of a domain in the T-BAR track defined by pole 
positions ... —3, —2, —1,1, and two output tracks 
3', 4', 5', ... and 7', 8', 9' ... deliver the logic function 
X • N. N is the flip flop function N = X . (N — 1) + 
X . (N— 1) where (N~ 1) is the previous state of the 
flip flop. Poles 2 and 6 are positioned so that if 
there is a domain on one of the poles, and none on the 
other, poles 3 or 7, respectively, are preferred 
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Figure 17 — Cylindrical domain flip flop — The state of 
the flip flop is determined by the presence or absence of a trapped 
circulating domain at the sequencing pole positions ; (idler) 4, 5, 
6, 7. Each new domain entering the input channel x changes the 
state of the flip flop by becoming trapped in the idler if it is full 


over poles 3' or 7' for the next step. As the transverse 
field rotates counterclockwise, a domain entering this 
device will travel along successively generated poles 
-3, -2, -1 and 1. When it reaches pole 2 it 
makes a decision to go to pole 3' or 3 depending 
on whether a domain is present or not on the idler 
position 6. If, a domain is present on 6, the two 
domains repel each other and go to poles 3' and 7' 
when the field rotates the next quarter cycle and 
henceforth stay on the output tracks 3', 4', 5' ..., and 
7', 8', 9' ..., leaving the idler position empty. However, 
if there is no domain on pole six when the input domain 
reaches pole two, the input domain goes next to pole 
3 and becomes trapped in the successively generated 
idler poles 4, 5, 6, 7, 4, 5 . . . until a new domain 
from the input track forces it out. The device, 
therefore, acts like a flip flop with one input 
and two identical outputs. The presence or absence 
of a domain in the idler position determines the state 
of the flip flop. A binary counter can be made by using 
one of the outputs as a carry to succeeding stages. 
Flip flops have been operated by using 11,000 A 
permalloy with the overlay design consisting of the 
usual one mil by five mil rectangles. The orthoferrite 
was TbFe0 3 , with a 54 Oe bias producing 3 mil 
diameter domains. The rotating field peak amplitude 
was approximately 17 Oe. 
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CONCLUSIONS 

We have seen that the orthoferrites provide interesting 
research material for both the theoretician and the 
experimentalist. Papers covering the wide swathe 
from materials preparation to device applications 
have been published. All available orthoferrites have 
been evaluated as potential domain wall device 
materials. It was found, for example, that the use of 
Sm 66 Tb i46 Fe0 3 orthoferrite will maximize the storage 
density since in this compound the smallest domains 
are found. Stable cylindrical domains 0.5 mil in diameter 
allow storage densities of 10 6 bits/in ? . 

Techniques for. propagating domains at data rates 
in excess of three megabits/sec have been demonstrated 
using conductor circuits. The upper limit on the data 
rate for either the “angelfish” or “T-BAR”is yet to be 
determined although it is expected that the rate for 
the latter will be in excess of one megabit/sec. Thus we 
believe that one of the future applications of domain 
wall devices will be in large capacity shift registers — 
a solid state disk file. 

Although most of the device work presented in this 
paper concerned the propagation of domains other 
efforts have pursued the areas of information insertion 
and detection, and magnetic logic. Magnetic logic is 
readily implemented using interactions of domains. 
Therefore, a second application is expected in special 
purpose memory-logic systems. 

Domain wall devices are fabricated using the pro- 
duction techniques pioneered by the semiconductor 
industry. Thus these devices should be a compatible 
companion to LSI in future systems. Domain wall 
devices require few process steps and as such should 
be manufacturable in high storage capacity units. 
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A new integrated magnetic memory 


by M. BLANCHON and M. CARBONEL 

THOMSON-CSF 
Laboratoire Central de Recherckes 
Essonnes, France 


INTRODUCTION 

V ery thin permalloy sheets were used by RCA, 12 in 1963, 
in order to achieve integrated magnetic memories. In 
1964, LFE 5 has described an approach to mass memories 
(10 7 — 10 9 bits) using this material and an integrated 
wiring. For different reasons, these two projects were 
abandoned. This paper shows that the two conditions 
of success are the choice of the shape of the element 
and the integration process. 

First, the shape of the element is discussed and it 
appears that the toroidal shape unsuitable for the 
realization of large integrated memory planes. Un- 
like the ordinary core, the three-hole element 4 has 
very broad tolerances on driving currents and on mag- 
netic characteristics of the material. Therefore, the 
three-hole core was chosen for the integrated memory 
plane described in the third part of the paper. 

Then, the drawbacks of the usual integration proc- 
esses are underlined and a new, much more reliable 
method is proposed. A 16 X 8 bits and a 32 X 36 bits 
plane were realized using this fabrication process. 
The characteristics of these memories are exposed in 
the last part of the paper. 

Memory device characteristics 

Batch-fabrication of memory planes necessitates 
a careful study of the characteristics of the memory 
element. The simplest shape is the toroid. 

Characteristics of the toroid 

Consider the element shown in Figure 1 and let 


us plot the switched flux versus the driving current, 
when only one current pulse is present (curve A) and 
when a large number of identical pulses is sent (curve 
B). For correct memory operation the toroid must 
switch completely for I and must not switch for 1/2. 
Let us name I MIN the minimum current needed to 
switch 90 percent of the flux with a single pulse (curve 
A), and Imax/2 the maximum current allowed to switch 
only 10 percent of the flux with a large number of 
pulses (curve B). The required conditions are I > I MIN 
and 1/2 < I M ax/2- This is not possible for the permalloy 
1/2 mil toroid where Imax/2 < V 2 (Imin)- For other 
pulse widths, other shapes or other thicknesses (1/8 
mil to one mil) this is still not possible. Thus, one is 
then led to use more elaborate driving currents such 
as bipolar digits or doublet currents. 5 With these 
improvements, the permalloy toroid memory will 
work but with relatively tight tolerances. However 
in batch-fabrication of a large number of toroids, 
tight tolerances will lead to low yield. Therefore, toroids 
were abandoned. 

The three-hole element 

Permalloy sheet intricate magnetic elements are 
easily obtained by etching. The three apertured ele- 
ment has many advantages for storage. Diagrams il- 
lustrating the operation of the element are shown in 
Figure 2. The four legs of the element are of equal 
width. Starting from the clear state, the one-state is 
written by applying the word-write drive alone. This 
will work for any value of the word current provided 
i w > i wo where \ wo is the magnetic threshold of the ele- 
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Figure 1 — S-eurves 1/2 mil thick etched permalloy toroids 
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Figure 2 — Operation of the three hole element 


ment. A zero state is written by applying simultaneously 
a digit and a word current, the only condition being 
that the digit current exceed the word current. Ap- 
plying a disturb digit drive has po effect on the zero 
but produces a flux rearrangement on the one state, 
magnetically decoupling the left hole from the out- 
put hole. Subsequent disturbs will therefore have no 
effect. As may be seen from the bottom of Figure 2 
the operating range is very widp and is not closely 
dependent on the magnetic characteristics of the ma- 
terial. 

This results in wide tolerances and a wide operating 
temperature range. Furthermore, a lack or reproduci- 
bility in the material or in the Element shape is not 
important. The three apertured element is therefore 
very suitable for batch processing; integrated magnetic 
memories. 

Memory plane fabrication 

The processing technique is extremely important 
for obtaining a good yield. Let us consider an example 


(Figure 3). The element here is a simple toroid and in 
the usual integration technique we find a lower winding 
and an upper winding tied together by a through-con- 
nection, thus creating one or two interfaces. These 
interfaces may be a thin layer of vacuum deposited 
copper 3 or an electrolytic solder. 6 This results in a serious 
lack of reliability (broken conductors). 

Another drawback comes from the insulation be- 
tween the winding and the elements. Since there are 
always pinholes in the insulators, there are often short 
circuits. One should note that the insulation of the 
edge of very thin elements is generally extremely dif- 
ficult. 1 * 2 

Finally the strains induced by the deposition of the 
windings may decrease the uniformity of the output 
signals. 

All these drawbacks lower the fabrication yield and 
the permalloy sheet memories become uneconomical. 

The new method described here starts from a perm- 
alloy sheet (1/2 mil thick). The permalloy is electro- 
plated with copper (1/2 mil). Using positive photo- 
resist techniques, holes are etched in the plate (Figure 





a) 


b) 


c) 



Hole 

Photoresist 



Figure 4 — The new fabrication process : cross sectional 
view 


4a and Figure 5). Then the sheet is exposed to the 
wiring pattern, developed and gold is electrodeposited 
to make the winding (Figure 4b). The copper is selec- 
tively removed, leaving intact the permalloy and the 
gold winding. At this point, the winding is held only 
by the edges of the holes in the permalloy sheet, 
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Figure 5 — Top view 


forming small bridges over the permalloy. The sheet 
is then dipped in photoresist which takes the place of 
the copper. After an exposure to the element pattern^ 
the magnetic elements are etched (Figure 4c). If it is 
desired, the memory may be completed by an encap- 
sulation. 

This method is attractive for several reasons: 

• Since the upper winding, the through-connection 
and the lower winding have been deposited at 
the same time, the wiring is continuous without 
any interface and this is the reason why it is ex- 
tremely rare to find a broken conductor. 

• Since electroplating tends to fill up all the holes, 
there are no pinholes at all in a 1/2 mil copper lay- 
er. Therefore, there are no short circuits in these 
memories. 

• Since the elements are etched after the wiring, 
there are no insulation edge problems. 

• Mechanical stresses may arise from the electro- 
plating of the copper layer and the gold winding. 
Removing the copper and etching the element 
shape relieves the residual stresses of the permalloy. 

Experimental results 

Memory plane models of 16 words X 8 bits and 
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Figure 6 — Photographs of 128 aind 1152 bits memory 
planes 

32 words X 36 bits were easily: fabricated using these 
techniques (Figure 6). High yield of acceptable planes 
seems possible even with larger planes. 

Characteristics for the 1152 bits storage planes are 
given in Table I. 

TABLE I — Memory Plane Characteristics 

.5 mil 
50 mA 
50 mA 
100 mA 

250 bits/cm 2 (1580 bits/Sqin) 
<5 /is 

1,0 mV i .7 fis 

The uniformity of the output signals is excellent as 
may be seen from Figure 7 where the output of 32 
three apertured element are shown superimposed. 

CONCLUSION 

Until now, integrated permalloy sheet memories were 



Figure 7 — Superimposed outputs of 32 elements (zero, 
one and disturb one) 


Hor 100 ns/cm 
Ver 1 mV /cm 

not a success. This comes from the choice of the element 
shape and the processing technique. By using a three 
apertured element and a new much more reliable fabri- 
cation method, these memories seem to have a bright 
future for mass memories. Higher densities and larger 
planes (256 X 72) are under study. 
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Mated film memory — Implementation of 
a new design and production concept 


by L. A. PROHOFSKY and D. W. MORGAN 

UNIVAC, Division of Sperry Rand Corp, 

St. Paul, Minnesota 

INTRODUCTION 

A high performance computer memory must operate 
at high speed, require a minimum amount of power, 
and be capable of operating under extreme environ- 
mental conditions. Thin film memories meet these re- 
quirements, however, anyone who expected them to 
become the primary memory technology was certainly 
premature. Despite its superior performance features, 
the thin film memory has encountered producibility 
problems which have prevented it from becoming cost 
competitive. Univac has developed the MATED 
FILM* memory concept and a continuous vacuum 
deposition system which together have overcome 
previous producibility obstacles and now make the 
evaporated film memory a serious contender for main 
store applications. 1 The features which are new and 
unique to this approach are: 

1. Economical continuous deposition for 16-hour 
periods with all deposition parameters main- 
tained in equilibrium. 

2. The closed-flux path design has wide opera-ting 
margins and provides an exceptionally low 
susceptibility to process variations. 

3. Changing the film array organization from a 
word-bit matrix to a bit-slice array has greatly 
reduced the number of connections and process 
steps required to fabricate the memory stack. 

This paper describes: (1) the MATED-FILM 
memory design which can be adapted to a wide range 


* Trademark of Sperry Rand Corporation. 


of capacity and speed; (2) the continuous vacuum dep- 
osition facility which has been developed for the 
production of MATED FILM memories: and (3) a 
wide temperature, 500 nanosecond, 5 X 10 5 bit memory 
which has been built and tested. 

Storage element 

Construction 

The storage element (Figure 1) is formed by a 
deposit of two layers of nickel-iron separated by a thin, 
deposited, copper conducting strip. Silicon monoxide 
layers isolate the nickel-iron layers from the copper 
layer. The layers of silicon monoxide are sufficiently 
thin so they do not interfere with the magnetic coupling 
of the two nickel-iron layers. 

Each layer is deposited through masks on glass 
substrates in a vacuum chamber (Continuous Vacuum 
Deposition System). When completed, the copper 
conducting strips form the sense/digit line enclosed 
by the two magnetic layers. 

An etched high permeability keeper is placed in 
close proximity to the deposited element (Figure 2). 
The storage element and the keeper are separated by 
a one mil insulating coating to avoid any shunt current 
paths through the keeper. The storage element now 
has a closed magnetic flux path for both the transverse 
and longitudinal axes. The advantages of this con- 
figuration are: (1) The transverse and longitudinal 
demagnetizing fields are reduced. This results in lower 
drive currents and improved operating margins. (2) In- 
teraction between adjacent bits is reduced to a negligible 
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Figure 1 — Storage element (exploded view) 



WORD CURRENT 
(TRANSVERSE) 


FILM SIGNAL 


DIGIT CURRENT 
(LONGITUDINAL) 



Figure 2 — Storage element drive fields 


level. (3) Word line to sense line capacitance, which is 
a source of word noise, is minimized. 

Theory of operation 

During deposition, a strong magnetic field produces 
a uniaxial magnetic anisotropy in the films of the 
storage element. Therefore, magnetization of the stor- 
age element exhibits a preferred axis in the plane of 
the element normal to the depositeds ense/digit line. 2 
A stored “1” or “0” magnetic state of the storage 
element is determined by the direction of magnetization 
around the sense/digit line and parallel to the anisot- 
ropy axis of the film. The magnetic flux resulting from 


Figure .‘1 — Signal drive current relationship 

a stored “1” or “0” closes through the silicon monoxide 
insulating layers and around the sense/digit line. 

Readout of the storage element is accomplished by 
passing word current through the word line. The 
resultant transverse field rotates the magnetization 
of the storage element, which induces a voltage in the 
sense/digit line. 

The initial direction of magnetization determines the 
polarity of the induced voltage. The relationship of 
the word current, film signal, and digit current is shown 
in Figure 3. The rotation of the storage element 
magnetization occurs during the rise time of the word 
current. 

Passing a current of selected direction through the 
sense/digit line restores or writes a “1” or <: 0” in the 
storage element. The resultant longitudinal field over- 
laps the trailing edge of the word current field and 
steers the magnetization to a state determined by the 
direction of the digit current. 

Nominal operating characteristics 

Storage element operating characteristics are ob- 
tained by plotting output flux as a function of drive 
currents for prescribed reading and writing conditions. 
The total output flux is obtained by integrating the 
output yoltage with respect to time. Since in normal 
operation digit current is common mode in the sense/ 
digit line pair, all digit currents are given as total 
array currents. This is twice the single element current. 

Output flux vs read word current 

Figure 4 shows the output flux of a typical element 
as a function of element word current for both the 
“1” and “0” states. 
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Figure 4- — Output flux vs. read word current 
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Figure 5 — Output flux vs. digit current 


The curve provides information on element output, 
symmetry (skew), and operating word current ampli- 
tude requirements. The curve is an actual plot obtained 
by: 

1 . W riting adverse history 256 times * 

2. Writing once in the opposite direction. 

3. Reading once and recording flux output at the 
indicated word current level. 


* For transverse fields exceeding the write threshold but below 
the saturating write level, the degree of saturation achieved 
becomes a function of the number of pulses applied. The first 
pulse will write a portion of the film while each succeeding pulse 
writes a little more. In this way, the film asymptotically ap- 
proaches the maximum magnetized state for the given field. 

Adverse history consists of a sufficient number of pulses to 
ensure that the element is conditioned prior to write with the 
magnetic state worst case for the write operation. It has been 
observed that there are no significant history effects beyond 250 
pulses. In memory applications, the element is operated beyond 
the saturating write level, where historv effects are negligible. 



Figure 6 — Output flux vs. write word current 



Figure 7 — The mated film core array 


4. Repeating steps (1), (2), and (3), incrementing 
the read word current each time. 

History and write word current amplitude: 500 
milliamperes. 

Digit current amplitude: 50 milliamperes. 
Output flux vs digit current 

Figure 5 shows the output flux level obtained with 
a fixed word current of 500 milliamperes as a function 
of digit current after repeated digit disturbs. 

The plot was obtained by: 

1 . Writing adverse history 256 times. 

2. Writing once in the opposite direction with the 
indicated digit current. 

3. Digit disturbing 256 times with the indicated 
digit current. 

4. Reading once and recording the output flux. 
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5. Repeating steps (1), (2)> (3), and (4) while 
incrementing digit current each time. 

Write word current amplitude: 500 mil- 
liamperes. 

Write digit current amplitude: 50 mil- 
liamperes. 

A digit current of 25 milliaipperes is sufficient to 
write, while a current of over &0 milliamperes is re- 
quired to digit disturb the storage element. 

Output flux vs write word current 

Figure 6 shows the output flux level as a function of 
write word current with fixed read word current and 
fixed write digit current. 

The plot was obtained by : 

1. Writing adverse history 256 times. 

2. Writing once in the opposite direction at the 
indicated word current. 

3. Reading once and recording flux output. 

4. Repeating steps (1), (2), and (3) while incre- 
menting the write and history word current 
each time. 

Read word current amplitude: 500 mil- 
liamperes. 

Digit current amplitude: 50 milliamperes. 

The write threshold occurs at 300 milliamperes and 
a saturated write is accomplished at 500 milliamperes. 

Memory array 

An array of 1024 active storage element plus 32 
spares is vacuum deposited on a photo-etched glass 


substrate (Figure 7). The deposited sense/digit line 
pair links all bits on the array making this a 1024 word 
by one bit slice of the memory. The storage element, 
in the shape of a capital I, is shown in the enlarged 
view of the array (Figure 8). The body of the I is the 
active region of the element. The remainder of the 
element is always in a demagnetized state ; however, it 
serves the useful function of reducing the transverse 
demagnetizing field. The two holes, which straddle 
each element, accommodate the word lines. 

Continuous vacuum deposition system 

The continuous vacuum deposition system is the 
one most significant feature which sets MATED FILM 
memory array processing apart from conventional 
batch processing systems. Operational shakedown tests 
on the system have been completed. These tests 
demonstrated the system’s feasibility as well as its 
capability. The capacity of the present system is 
10 8 bits per year. A program to increase this rate will 
put the facility in a full capacity mode of 1.6 X 10 9 
bits per year by early 1970. 

Continuous fabrication 

MATED FILM memory arrays are fabricated by a 
continuous vacuum distillation process using an in-line 
concept of material flow. Glass substrate blanks travel 
sequentially through four deposition chambers (Figure 
9) where progressive layers of magnetic alloy, copper, 
and insulator material are deposited through precision 
contact masks. The lost time due to pumpdown, sub- 
strate heating, and substrate cooling in a batch process 
is saved in this continuous process once steady state 
vapor composition is achieved. Typically this is 20 
minutes after start-up. 

Within the vacuum chambers, the various materials 
are vaporized continuously and the deposition is 
monitored and controlled automatically. A production 
cycle of 16 hours during a 24-hour period is realized 
using this process. 

In conventional batch distillation processes, the 
composition of a multi-component vapor is a time 
dependent function. The higher volatility fraction 
vaporizes in a proportion greater than its melt fraction. 
To achieve a deposited alloy film of a precise compo- 
sition, for example zero-magnetostriction iron-nickel 
alloy, the vapor stream must be captured at a point 
in time determined by composition versus distillation 
time. 3 With continuous fabrication, the process control 
is built around negative feedback techniques which 
routinely control the composition of the alloy vapor 
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Figure 9 — Schematic, thunderbird facility 


for continouus periods of 16 hours. The vapor distilled 
by this steady-state process produces constant zero- 
magnetostriction nickel-iron vapor for time periods 
measured in hours rather than minutes. 

The separate production stations (the four deposition 
chambers) of the continuous system permit corrections 
to be made easily and quickly. Also, the continuous 
emergence of arrays allows for prompt monitoring 
of the system. After each deposition stage, the sub- 
strates are removed and inspected. As soon as a defect 
is detected, the continuous system can be stopped and 
the problem isolated and corrected. Loss of process 
control in the batch system, no matter when it is de- 
tected, usually results in loss of the entire batch. 

System description 

MATED FILM memory arrays are fabricated in 
four identical continuous vacuum evaporators. Each 
evaporator (Figure 10) consists of a deposition system, 
a transport system, and pumping system. 


The deposition system is capable of evaporating up 
to three source materials concurrently at specific 
rates. Electron beam heated sources are used for nickel- 
iron, copper, and chromium. The SiO source is resis- 


TRANSPORT SYSTEM 



Figure 10 — Continuous vacuum evaporator 
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tance heated. Evaporant shutters above each source 
automatically expose the substrate for a predetermined 
time interval. The evaporation rate of the nickel-iron 
and copper sources is controlled using a vapor rate 
monitor. The monitor signal is used to regulate the 
electron beam gun emission current. Evaporated 
materials are replenished by wire feeders which draw 
nickel-iron or copper wire from a spool and guide it 
into the molten source. The removable base plate, 
which contains all of the deposition equipment except 
the shutters and vapor rate monitor, fits onto the bot- 
tom of the main chamber. 

The transport system moves substrates from a 
magazine in the input chamber to the main chamber, 
where the depositions are made, and then into the 
output chamber. Heaters raise the substrate to dep- 
osition temperature during transit from the input 
chamber to the deposition chamber. The substrates 
pass through a water-cooled tunnel in the cooling sec- 
tion of the transport system, which cools them to 
handling temperature before ; they enter the exit 
chamber. 

The automatic pumping system has three interlocked 
subsystems controlled from a single console. The 
pumping system maintains high vacuum in the depo- 
sition section of the evaporator,; while cycling the input 
and output chambers from atniospheric pressure down 
to high vacuum as required by the transport section. 


System operation 

After the substrate is inspected for possible defects, 
it is placed in the substrate holder and covered with 
the first mask. Subsequent substrates and masks are 
loaded in holders and placed in a cartridge. A cartridge 
of holders is loaded into the input chamber of Station 1. 
The holders are automatically ejected from the car- 
tridge and pushed sequentially toward the deposition 
chamber. Within each deposition chamber the substrate 
is exposed at two of the three positions or windows 
available. At the first position the bottom magnetic 
layer is deposited in the memory bit pattern. At another 
position the silicon monoxide is deposited over the 
magnetic alloy through the same mask. When this 
process is completed, the holders are pushed to the 
exit chamber. 

After it has been removed from the exit chamber, the 
substrate with the first magnetic alloy and silicon 
monoxide layers is inspected, and returned to the 
substrate holder with the mask for the interconnecting 
elements. The cartridge is then reloaded into the input 
chamber of Station 2. Using the same procedure, a thin 


adhesion layer of chromium is deposited for the sense 
line interconnecting elements, followed by an overlaying 
deposit of copper. 

At Station 3, the substrate is again removed, in- 
spected, and loaded into the input chamber using 
different masks for the sense/digit conductor deposi- 
tion. Chromium, copper, and silicon monoxide are 
deposited using all three exposure positions. 

At Station 4, the top magnetic layer and silicon 
monoxide are deposited. The outer film of silicon 
monoxide seals and insulates the memory bits. 

At this point, the completed film arrays are ready 
for functional testing before being assembled into 
memory stacks. 

Memory stack construction 

The MATED FILM memory can be thought of as 
a two wire system. One axis of stringing and its asso- 
ciated connections are an integral part of the previously 
described deposition process. To complete the stack 
it is only necessary to string the word axis and terminate 
these word lines in the word diode selection matrix. 

The memory plane assembly is formed by bonding 
two film arrays to a single keeper, as shown in Figure 11. 
In this form the array is less susceptible to scratching 
or cracking during subsequent assembly. 

The film arrays are combined to form a 1024 word 
by n bit substack with one array for each bit in the 
memory word. The substack can then be arranged in 
various series/parallel configurations to meet specific 
system requirements. The design will accommodate 
word length up to 256 bits without affecting cycle time. 

Figure 12 is an exploded view showing the substack 
construction. The memory planes are stacked with the 
etched holes vertically aligned; half of each word loop 
is connected to the bussed word line header and is 
threaded down through the substack, while the re- 



Figure 11— Memory plane assembly 
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Figure 12 — The substack 


maining half of each word loop is threaded from the 
bottom of the substack. The preformed wire wraps 
connect the word loops at the bottom. The top end of 
the word loops are wire wrapped to the diode leads. 
The wire wrap connections are then mass soldered to 
ensure a reliable electrical connection. The completed 
substack contains 32 spare words and 10 percent 
spare planes which are externally accessible. These 
spare words and planes may be used, without restriction 
anywhere in the substack. This means that the sub- 
stack will never require rework unless all of the spare 
words or spare planes are consumed. 

Memory system 

The memory substack and element design does not 
vary with the application; however, some of the memory 
electronics must be tailored to the specific capacity 
and speed required. One typical configuration which 
has been built and tested is a 16K word, 32-bit mili- 
tarized memory with a cycle time of 500 nanoseconds. 
A sketch of this memory (Figure 13) shows the location 
of the memory subassemblies. 

The heat exchanger which mounts on the front face 
of the chassis is not shown in this sketch. Cooling is 
accomplished via thermal conduction from the com- 
ponents to the heat exchanger which is convection 
cooled by external air. 

The stack module (Figure 14) contains a pair of 
1024 word, 64-bit substacks mounted on a common 
plug-in header. The connectors on each side carry the 
drive lines leading to the diode selection matrix. The 
stack modules are field interchangeable within and 
between chassis. 



FIGURED THE MEMORY CHASSIS 


WEIGHT 

49 lbs. 

CAPACITY 

1 6K words 32 bits 

DIMENSIONS 

18.3' x 11.3' x 5.5' 

CYCLE TIME 

500 nanoseconds 

INPUT POWER 

190 Watts 

ACCESS TIME 

225 nanoseconds 

INPUT VOLTAGE 

90 volt (internal 
dc to ac power 
converter) 

INTERFACE 

8 Channel Asynchronous 

COOLING 

Conduction to a 
convection heat 
exchanger 

ENVIRONMENT TOLERANCE 

Mil- E- 16400 Class 1 


Figure 13 — The memory chassis 



Figure 14 — The stack module 

Sense/digit configuration 

The total sense/digit line is formed by intercon- 
necting the 1024-bit sections, which are part of the 
individual substacks. Figure 15 shows one of 64 com- 
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Figure 15 — Sense/digit line configuration 


plete sense/digit lines. Stack modules 1 and 2 form 
the left and right halves, respectively, of the 4096- 
bit bridge. Stack modules 3 and 4 form a second 
bridge and are connected in parallel to a common sense 
amplifier and digit driver. 

The common mode choke ensures that currents 
flowing in and out of the bridge are equal, and provides 
both a common mode and differential null at the sense 
terminals to the degree the legs of the bridge are bal- 
anced. This unbalance is controlled so that the digit 
noise induced into the amplifier is less than three times 
the signal, a level within the tolerance of the amplifiers. 
The center driver transformer reduces the time required 
for the digit current to achieve steady state through- 
out the line to 40 nanoseconds. Without this trans- 
former, the time would be 80 nanoseconds. 

Word selection 

Words are selected by the following method. The 
four stack modules, each containing 2048 double length 
words, combine to form the system capacity of 8192 
double length words arranged in a 64 by 128 matrix 
(Figure 16). On the driver side of the matrix, address 
bits S 6 , S7, and S 8 along with S 9 , Si 0 , and S u are decoded 
to form an eight by eight matrix which selects one of 
64 drivers. Similarly, other address bits are decoded to 
select one of 16 diverters and one of eight diverter 
selectors. Word current passes through the word loop 
which lies at the intersection of the drive line and the 
diverter line. The word current generator controls the 
amplitude and timing of the word current pulse. 

Timing 

Figure 17 shows the timing for a typical memory 
cycle. Prior to time zero, all requests were processed 
and the memory was waiting. Then, at time zero, a 



Figure 16 — Word selection 



Figure 17 — Timing for a typical memory cycle 


memory request arrived at the memory interface. 
For this condition, 165 nanoseconds are required to 
acknowledge the request, process it through the priority 
network and gate the address into the memory address 
register. 

At t = 235 nanoseconds, the address is decoded and 
the proper word and diverter switches have been 
turned on. Word current is driven through the selected 
word loop, interrogating the films in that word. The 
sense signal peaks within the 50 nanosecond rise time 
of the word current. The polarity of the film signal 
indicates the stored state. The sense preamplifier output 
is shown for both a stored “1” and “0.” 
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A sense signal from the near end of the sense line 
has only a 10 nanosecond delay through the preampli- 
fier; a signal from the far end of the sense lin$ has the 
additional 40 nanosecond delay of the sense line. 

The bottom trace shows the length of time the con- 
tents of the data register are valid. During this time, 
the digit driver is turned on; the polarity of the digit 
current determines which state is to be stored. On a 
read cycle, the data is recirculated from the data 
requester. At this time if the data from the requester 
is not available, the memory performs a split write 
cycle while waiting for the data to arrive. 

At t = 600 nanoseconds, priority evaluation of active 
requests begins. If active requests are present, the 
memory will recycle every 500 nanoseconds. 

Test results 

A preproduction model of the memory system de- 
scribed was completed in April 1969 and has been 
undergoing environmental evaluation. Figure 18 con- 
tains “schmoo” data which indicates the threshold of 
the first bit failure, with the memory system running a 
comprehensive pattern of writes, reads, and disturbs. 
Word and digit currents are shown as a percentage of 
deviation from nominal, I w = 700 mA, I d = 45 mA. 
The center square represents the system's drive current 
limits; these limits are d= 5 percent. This is safely within 
the usable operating region, as indicated by the 
“schmoos”, for ambient temperature ranges of — 55°C 
to 65°C. The degree of overlap of the high and low 



Figure 18 — -Memory system operating margins 


temperature “schmoos” eliminates the need for drive 
current temperature compensation. 

These results show the nominal characteristics of 
the storage element to be quite representative of the 
entire memory. The results also show that there are 
no noise or signal interaction conditions in the stack 
or electronics that will compromise system margins. 

Above the maximum digit current failure is the 
disturb of unselected bits. This limit approaches the 
He of the films since the element design very effectively 
minimizes transverse fields on these bits. This would 
otherwise aggravate the condition. Film dispersion and 
skew determine the minimum limit of digit current 
for an adequate write. Word current could not be 
varied above 20 percent of nominal so the schmoo 
in this region is not know. Minimum word current 
failure is caused by the reduced effective rise time re- 
sulting in a delayed and reduced signal peak. 

CONCLUSIONS 

The existing MATED FILM memory design is con- 
servative, yet competitive. As with any new technology, 
future development can be expected to enhance per- 
formance and reduce costs. The two most significant 
growth areas for MATED FILM are higher speed and 
higher bit density. The feasibility of a 200 nanosecond 
cycle time for systems up to 10 6 bits has been demon- 
strated by several partially populated breadboards. 
Expansion of memory in the word direction has little 
effect on cycle time. The high bit density in this direc- 
tion minimizes delay and loading effects. 

Part of the future plan for this memory is to double 
the bit density on the present size array so that each 
array will contain 2048 bits. This will provide such 
direct improvements as reduced costs, increased pro- 
duction capacity, and smaller physical size. 
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A computer engineering laboratory 
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University of Delaware 
Newark, Delaware 


INTRODUCTION 

The advent of modern electronic computers has ex- 
panded the scope of nearly all areas of scientific 
endeavor. The electrical engineer is perhaps most 
acutely affected by this expansion by virtue of his 
two-fold interest in computer processes. He is, as are 
his colleagues of other scientific disciplines, excited 
by the computing capabilities now at his disposal. 
Even more, he is deeply involved by virtue of his re- 
sponsibility for the conception and design of the com- 
puter and its hardware adaptation to a variety of ap- 
plications. It is to the second phase of the electrical 
engineer's involvement with computers that our 
educational activities are directed, that is, to his in- 
volvement in the realization of computers or computer- 
like systems. 

The environment 

In order to adequately portray this educational ac- 
tivity, it is necessary to describe the environment in 
which it takes place. This environment will be de- 
scribed as it applies to electrical engineering students 
at the University of Delaware. However, this is not 
an atypical situation and the description could apply 
to many of our universities. 

Present status 

Our senior students are now beginning to come from 
a generation which has grown with the computer. Some 
have started their association with computing ma- 
chines in high school or even earlier. All have been 
through some sort of a problem-oriented first course 
which leads to machine solutions employing a lan- 


guage like FORTRAN. All have become familiar with 
the power of the computer for problem solving as 
early as their first course in Linear Circuit Theory (a 
candid admission here is that some problems at this 
level are indeed a bit forced). By the time these stu- 
dents have become juniors, they are aware of user- 
oriented packages such as ECAP (Electronic’ Circuit 
Analysis Program, an IBM applications program) 
and have employed this type of program in analysis 
of active and passive networks. Modeling and simula- 
tion have become familiar terms and tools to these 
students. 

Except in the very earliest courses, machine compu- 
tation is not introduced artificially. The students have 
been challenged by the problems. Courses have not 
been modified to simply introduce computational 
techniques; rather, the problem areas have no longer 
been artificially compressed to exclude the large sys- 
tem or the nonlinear problem which motivates the com- 
putational techniques. It should be mentioned that 
closed-form solutions and functional relationships are 
sought first. We do not seek to relegate all problems 
to computer solutions but rather to find a reasonable 
balance between this and the more traditional treat- 
ment of problems. 

All of these activities are motivated by the search 
for solutions to generally traditional problems in 
electrical engineering; these activities have been termed 
applications oriented. For the most part, engineer- 
ing educators tend to center their computer related 
activities about the capability of machines for solving 
traditional problems and the vehicle by which this 
computational power may be focused on their particu- 
lar discipline. In such application areas, our educational 
515 
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system seems to be responsive to the student's re* 
quirements. 

Changes 

The electrical engineer's environment is dynamic. 
An educational system which was responsive to the 
needs of the past may not now serve. There are new 
problems of importance, problems which have been 
spawned by the very existence of the computer. Re- 
cent electrical engineering graduates are concerned 
with the design of systems which may involve a general- 
purpose digital computer in an on-line control function, 
a data-retrieval and signal-processing operation or 
some similar real-time application. Control, communi 
cation, pattern recognition, filtering, and numerous 
other system functions are frequently developed about 
special-purpose digital computers. As a class, such 
systems certainly represent a significant portion of 
today's electrical engineering effort. With these prob- 
lems for motivation, electrical engineering, students 
view a casual user relationship with computers as 
simply not being relevant to their educational needs. 
Their interests and future responsibilities can only be 
served by an involvement which gives them ah inti- 
mate experience with this developing environment. 

The importance of this changing situation has been 
recognized at the University of Delaware and over the 
past five years, several curriculum modifications have 
been made to strengthen and update our related ac- 
tivities. The subjects which have the strongest rela- 
tion to this area and, as such, the ones which have 
received the greatest attention in our revisions, cover 
such topics as logical design^ switching theory and 
computer organization. The curriculum modifica- 
tions have extended into such traditional courses as 
electronic circuits, control systems, communication 
systems, and information theory. These courses have 
been modified to emphasize the role of discontinuous 
elements or discrete systems or to introduce the no- 
tion of digital processes. Some course work is immediate- 
ly related to digital systems and their design while 
more remotely related course work simply encourages 
thinking in terms of digital problem solutions. 

Role of the laboratory 

These curricular innovations have permitted the de- 
velopment of the general analysis, synthesis or design 
techniques required for the examination of digital 
systems. Mathematical descriptions of the situation 
are developed from models of these systems. As in any 
physical situation, the conclusions drawn from manipu- 


lation of the mathematical models are no better than 
the original representation of the system; in addition, 
the modeling process itself is often tempered by the 
degree of rigor which may be mathematically tractable. 
Consequently, the conclusions drawn from analysis 
of the models may fail to give a complete or accurate 
representation of the physical digital system's be- 
havior. In this area then, as in all areas of engineering, 
it is felt that laboratory experience acts as a medium 
through which the reality of the physical situation 
may be brought to the student. He is made aware of 
the limitations of his system models and the implica- 
tions of his modeling process. It is in the laboratory 
that a student must pursue the details of the subject; 
this is where he “puts it all together." Thus, progress 
in the discipline area requires progress in related labora- 
tory experiences. 

Enhancing the quality of laboratory studies in dig- 
ital systems is a process which is not accomplished 
without assiduous attention. This is true of laboratory 
studies in general and it is especially the case for a 
digital systems laboratory. This is at least partially 
due to the plague which has been termed the “tyranny 
of numbers." A common characteristic of digital 
systems is certainly that large numbers of elements 
are required and that large numbers of connections 
must be established. Only trivial problems! can be 
attempted in an afternoon spent in the laboratory. 
Even trivial systems can quickly spread into a maze 
if usual breadboard techniques are used. Laboratory 
budgets can rapidly become unrealistic if even only 
one or two students wish to retain a problem of moder- 
ate complexity. Some early efforts were made to de- 
velop small patching stations and arrangements 
which would help alleviate these problems. These 
efforts served some pedagogical purpose; however, 
their limited versatility and the relatively slow ex- 
pansion process did not permit them to foster the 
desired growth of this area. 

The state of our laboratory has been enhanced by 
the acquisition of a small digital computer and the 
introduction of this machine into a system which ap- 
proximates a generalized interface. This system per- 
mits physical access to all of the essential computer 
functions and incorporates facilities for patching 
connections to external digital logic-modules so that 
an extension of the computer or an interfacing system 
may be rapidly established. We have dubbed the 
system with the acronym DADEC (Design and De- 
monstration Electronic Computer). This system, which 
represents only a modest investment, has proved to 
be a boon in the inspiration of interest and stimulation 
of growth in this study area, 
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Several laboratory experiments and exercises have 
been developed about this DADEC system Some of 
these are extremely simple exercises which serve to 
establish familiarity with the machine, its coding, 
logic levels, etc. Some experiments are rather sophisti- 
cated real-time data processing adventures. The set 
of experiments was designed to support course work 
from sophomore computer science level through elec- 
trical engineering senior projects. 

In this paper, the DADEC system will be described 
and several example problems outlined. The examples 
have been chosen to illustrate the range of educational 
levels which may be served using the experimental 
system, the versatility of the system, an example from 
several of the particular related course areas, and 
some problems which may be of general interest. 

The DADEC system 

The DADEC system is conceptually and practically 
very simple; a block diagram of the system is shown in 
Figure 1. Central to the system is a small general- 
purpose digital computer. A number of digital logic- 
modules (flip-flops, gates, one-shots, line drivers, etc.) 
are mounted in adjacent frames with a patch panel 
which permits the rapid establishment of intercon- 
nections between these peripheral elements and the 
computer. All of the computer interfacing lines are 
available at terminals on this patch panel. 

The majority of the logical building blocks are 
completely unspecified, that is, any available logic 
module may be substituted in the patching arrange- 
ment. It has been found that a few specific functions 
are repeated in a great many interfacing problems, 
and these functions have therefore been prewired on 



Figure 1— DADEC system — sBlock diagram 


the patch panel (two binary up-counters and one 
binary up-down-counter). Switch-registers, light-regis- 
ters, some momentary contact switches and free indi- 
cator lights are available as a portion of this generalized 
interface. Trunk lines are available for connection to 
remote equipment such as analog tape transports, 
signal sources, etc. 

An analog-to-digital converter is included in this* 
system. Students have designed, built and added a 
four-channel analog multiplexer. Students have also 
designed, built and added ten channels of digital-to- 
analog conversion. A portion of this D-A converter is 
used to drive a storage oscilloscope facility. This sys- 
tem is by no means static; we are presently adding 
additional equipment racks for the inclusion of micro- 
logic modules. Plans include the addition of a paper- 
tape reader-punch and a disc to the system. An incre- 
mental digital tape recorder for accumulation of data 
for later off-line processing is to be interfaced by the 
students and added to the system. 

A few comments are in order regarding the selection 
of the particular computer for use in the DADEC 
system. While the computer is general-purpose, it is 
not subject to the same set of constraints which govern 
the selection of a machine for a user oriented com- 
puting center. For our purposes, the most important 
criterion for evaluating a machine is its ability to 
contribute to the educational process. In order to 
contribute, it need not have a tremendous core storage 
capacity or a rapid thru-put capability. Since the 
machine has been in use, its applications have been 
concerned with interface problems or the demonstra- 
tion of system functions and not with its use simply 
as a computational device. The machine need not 
have a long word length; there is very little pedagogy 
which is served by a twenty-four bit machine which 
is not adequately served less expensively by a twelve- 
bit machine. Indeed, the short word length and the 
resulting abbreviated instruction list and core paging 
system actually serves our instructional purposes. The 
computer should be easy to interface and adaptable 
to a large variety of peripheral equipment. It should 
have inherent compatibility with a family of logic 
circuits which are readily available. The machine 
should be easy to service; frequent failures of the sys- 
tem are observed, since many of the experiments in- 
volve hardware entry into the internal operation of 
the machine. Finally, it is a desirable attribute if 
the machine has at least a limited FORTRAN language 
compatibility. This enables inexperienced coders to 
immediately use the system once any additional soft- 
ware is established for addressing peripheral devices. 
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One currently has a rather large selection of machines 
which meet these objectives (at least 25 such machines). 
At the time our decision was made, the list w r as not so 
extensive, but we have found that the Digital Equip- 
ment Corporation's PDP-8 is a very satisfactory, 
moderately priced machine. 

Example experiments 

Several example experiments will be outlined in this 
section. Some of the experiments are, of course, promp- 
ted by the requirement that students must first be 
introduced to this system; however, the predominant 
motivation is problem solving. When the system was 
first conceived, the faculty felt responsible for specifica- 
tion of a number of problems to be implemented. We 
felt that we would be hard pressed to find a sufficient 
number of examples to insure full utilization of the 
system, however, the students have been encouraged 
to suggest problems and their exuberance now pre- 
vails. We encourage the students to seek problems 
from other departments on campus and their sug- 
gestions have covered the gamut from exotic time- 
sharing activities to automatic control of oyster re- 
production. These following few examples were chosen 
from student suggested projects. 

An introduction to the system 

The Electrical Engineering Department is respon- 
sible for the instruction of computer science majors 
of the College of Arts and Science in a course that is 
oriented toward the hardware and architecture of 
computing systems. For the most part, these students 
will have had no experience with a digital computer 
at a more intimate language level than FORTRAN. 
We find that a simple machine-language program tracing 
experiment is extremely effective in establishing both 
an introduction to the DADEC system and the opera- 
tion of a compiled language. A simple type-out routine 
is coded in FORTRAN; this program is compiled 
and loaded along with the operating system. The rou- 
tine is then executed in a single-step machine-language 
mode so that all of the required steps of masking, code 
conversion, communication with a peripheral device, 
etc., may be examined using the register information 
supplied by the DADEC system. This experiment is, 
of course, extremely simple; however, it does illus- 
trate the fact that this somewhat generalized digital 
system finds use even at early instructional levels. 

An extension of the computer 

These computer science students soon become moder- 


ately proficient at programming in the assembly lan- 
guage of this machine. Programming instruction is 
not a part of the course per se; but the relation between 
“hardware” and “software” which is discussed, quite 
often naturally brings up coding problems. Near the 
end of the course they are capable of more ambitious 
experiments in which additional commands are added 
to the repertoire of the computer. An example of this 
is the addition of a “hardware” EXCLUSIVE-GR 
command. In this experiment, a program controlled 
input/output transfer is initiated to transfer the con- 
tents of two memory locations to external registers. 
The peripheral portion of the system performs the 
EXCLUSIVE-OR operation and transfers the data 
back into the accumulator. Now, of course, a program- 
mer can accomplish a similar result with a sub -routine 
of some fifteen or so statements. The student is thus 
faced with an example of what is often called the “hard- 
ware-software” trade-off. 

Automatic testing 

Within the electrical engineering curriculum, em- 
phasis is placed on designing the class of electronic 
circuitry which is usually involved in computers. Each 
student is assigned the problem of accomplishing a 
“worst-case” design of a discrete element NAND/ 
NOR gate. This design requires that a certain fan-in, 
fan-out requirement be met at room temperature with 
any transistor from a given distribution* The DADEC 
system is used in the evaluation of the students de- 
sign, that is, in testing of the circuits. The students go 
through the procedures of design computation, bread- 
boarding, testing, reevaluation of their design, and 
finally, fabrication of their design on a printed wiring 
board which is acceptable in the DADEC interface 
system. The system then exercises their circuit by 
connecting output loads and applying worst-case 
signals while circuit conditions are tested with the 
analog-to-digital converter. The computer gives the 
student a grade on the lab experiment which indicates 
how well he met the design objectives. 

Encoding and decoding 

A course discipline area is developed in the theory 
of simple sequential systems. As an example problem, 
and one which draws upon the student’s information 
theory background, a single error correction digital 
transmission system is designed. An asynchronous, 
sequential coder and decoder are realized using NAND 
gates. This sub-system is patched into the DADEC in- 
terface and the computer is used to generate code 
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groups which are transmitted to and received from the 
transmission system. A random error generator (a com- 
puter subroutine) creates a noisy channel or errors 
in the transmission path. The computer further analyses 
the transmission and reports the performance statis- 
tics of the system. 

Understanding the computer functions 

The particular computer employed in this system 
has two rapid input-output data transfer mechanisms. 
These are called single-cycle and three-cycle data- 
break transfers. These are rather difficult mechanisms 
for the students to assimilate. This is not because they 
are conceptually difficult but because of the large 
number of signals which must be recognized and care- 
fully timed. A simple experiment serves to illustrate 
both of these data-break facilities. We call this ex- 
periment a hardware clear core. In this interface, 
the single cycle data-break is first called to set zeros 
into core location zero and one into location one. The 
three cycle data-break is then initiated with a word 
count register as location zero accompanied by presen- 
tation of all zeros on the data lines. This has the net 
effect of clearing all core locations except zero and 
one. The single cycle data-break is then again called 
to clear these two locations. This is all accomplished 
with a sequenced switch operation in the interface. 
While the interface is particularly simple, the experi- 
ment does require a sophisticated understanding of 
the operations of the computer. 


Some more challenging experiments 

Student projects are being executed using the 
DADEC system. In this project environment, rather 
comprehensive problem areas are either suggested to 
the students or suggested by the students. They may 
then pursue a solution of the problem for one or per- 
haps two terms of their senior year. Several of these 
problems will be described in greater detail than have 
the previous problems, since these serve to illustrate 
the student's approach to problem solving. 

A pulse-height analyzer 1 

The analysis of pulse-height information is quite 
suitable for digital sub-system solution . This particu- 
lar pulse-height analyzer is unique in that the pulses 
are of only about 30 nanoseconds duration and the 
counting interval must be short (about 50 micro- 
seconds) with no dead time between successive count 
intervals. The student approached the problem by 
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Figure 2 — Pulse-height detector — Description 


designing an asynchronous sequential circuit which 
transmits a standardized pulse whenever its input 
pulses meet the proper amplitude criterion. A descrip- 
tion of this system is shown in Figure 2. Two com- 
parators are used as decision elements to determine if 
the input signal has passed either the low threshold 
voltage (Vx,) or the high threshold voltage (V H ). The 
results of these decisions i.e., the output of the com- 
parators, are described by Boolean variables H and L. 
A flow table which summarizes the required circuit 
action for any input sequence is shown in Figure 2 
(note that flow tables of this type are described in 
references such as Maley 7 ). 

This flow table may successfully be assigned internal 
state variables (f x and f 2 ) as shown. The excitation 
table may be formed, and from these tables excita- 
tion functions (F x and F 2 ) and the output function 
(Z) may be derived. 
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TIME SCALE lOOns/div 
AMPLITUDE 500mv/div 
(OUTPUT 2 volts/div) 

V l ^PULSEsv h 

Figure 4 — Pulse-height detector — Performance 



A logic diagram realizing these excitation and out- 
put functions using NAND elements is shown in 
Figure 3. The Z function feeds a; pulse amplifier which 
produces standardized pulses upon a logical 1 to 0 
input transition. Figure 4 indicates the perfor- 
mance of this pulse-height detector in response to 
pulses which dwell at the threshold level for only some 
10 to 15 nanoseconds. Notice that pulses less than the 
low-threshold or greater than the high-threshold 
produce no output. Pulses with amplitudes between 
these thresholds produce standard 100 nanosecond 
output pulses. 

These output pulses are directed to one of a pair of 
up-counting registers in a synchronous sequential 
sub-system. These registers alternately store the count 
for the appropriate counting interval and then dump 
the stored count directly into a memory location using 
the computer data-break facility. The entire analyzer 
interface, which consists of some 45 flip-flops, 50 gates, 
and about five other miscellaneous circuits, is patched 
on the DADEC system. The computer controls the 
counting interval and keeps track of the appropriate 
core locations for data storage . 

The computer also controls the two threshold volt- 
ages V H and V L by directing appropriate numerical 
values to two channels of the digital-to-analog con- 
verter. Two additional D/A channels are employed 
for graphical display of the accumulated count as either 
a function of the threshold voltages or time. This 
is accomplished by simply presenting these two analog 
channels and a device selection channel to the X, 
Y and Z axis of a cathode-ray^tube with storage fa- 
cilities. 

For this problem, and indeed for all problems of a 
project nature, the software support must also be de- 


veloped by the students. In this instance, there are 
very few calculations accompanying the process and a 
rather short symbolic program suffices to control the 
experiment, accumulate the data, present the display, 
and punch out information for later entry into a larger 
computer for analysis. In this instance, the DADEC 
system is functioning as an on-line data retrieval 
system with quick-look facilities and off-lime data 
processing. 

Play ball 2 

An interesting set of experiments is developing in 
the area of physiological monitoring of athletes. Thru 
the cooperation of the coaches and players of a base- 
ball team, it has been possible for us to introduce strain 
gages and other transducers in the player's bats, 
switches in the player's shoes, contact assemblies in 
the bases and ball-speed monitoring equipment in 
the playing field. Small digital sub-systems have been 
designed and built to time the player's run to first 
base after the crack of a bat, to time the pitch, and 
to monitor the position of the pitcher's and batter's 
feet. The DADEC system is used to collect and corre- 
late these data and alsoto sample and digitally repre- 
sent the bat acceleration during the swing. These 
processes are all moderately simple and their imple- 
mentation is straightforward; they will not be further 
described. 

In this application, the DADEC system is used for 
data accumulation. Information is produced on punched 
paper tape for later analysis on large data-processing 
machines. For the baseball fans, a typical set of data 



Figure 5— Typical baseball data 
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showing one batter’s swing is presented in Figure 5. 
Two channels of bat acceleration are presented. One 
channel of acceleration is measured normal to the 
axis of the bat in the direction of the label, and the 
other is measured normal to this direction. The time 
at which the pitcher’s foot leaves the mound (approx- 
imately the release time of the pitch) is indicated as 
is the shift in weight on the batter’s feet. Time is 
measured backwards in this diagram from the instant 
at which the ball was hit. It might be mentioned that 
this is not a game situation; this trace was taken dur- 
ing batting practice and most pitchers would pitch 
faster than this in a game situation. 

Star gazing 

Astronomers on campus are interested in monitoring 
the emitted light intensity from a star as it passes 
behind the moon. For a brief moment, when the star 
becomes eclipsed, one may observe diffraction of the 
light from the localized star source by the edge of the 
eclipsing moon. If sufficient detail regarding the dif- 
fraction pattern during an occupation is recovered 
from an experiment, then an apparent stellar diameter 
may be computed. The experimental procedure con- 
sists simply of observing the appropriate star with a 
telescope and focusing the total light collected from 
that star on a photomultiplier tube. The data recovery 
problem is being approached in two ways. One solu- 
tion resembles the previously described pulse-height 
analyzer while the other resembles a portion of the 
baseball data recovery scheme. 

The first solution method treats the output of the 
photomultiplier tube as a pulse source. 3 Pulses are 
again sent to a counter which is directed to count for 
a prescribed interval. At the end of this interval the 
contents of the counter are transferred to a shift regis- 
ter, the counter is cleared and again accepts pulses. 
In this application, the words are shifted out to an 
intermediate storage magnetic tape which is later 
read into the computer off-line. The motivation for 
this mode of operation arises from the requirement 
to develop a portable system which can be carried to 
the telescope sight. In this instance, the DADEC sys- 
tem was used as the bread-board for all preliminary 
design of the specialized digital sub-system. The 
DADEC system is again employed in the recovery of 
the data from the returning digital tapes. 

The second solution consists simply of processing 
analog tape recordings of the stellar occulation. 4 The 
analog source is the low-pass filtered output of the same 
photomultiplier tube. The DADEC system controls 
the analog transport and accomplishes the logic for 


extraction of sample values from the appropriate sec- 
tor of the tape. 

In either experimental procedure, the end result is 
a number list which represents the light intensity 
as a function of time during the time of the occupation. 
For either set of data, a fast Fourier transform algo- 
rithm is applied to the sampled time functions. The 
relative amplitude of certain frequency components 
yields information from which the stellar diameter 
may be determined. 

The final phase of data recovery for these problems 
is highly computational. For this reason, it is deemed 
desirable that the supporting software be written in 
FORTRAN. The students must develop facility with 
FORTRAN in order to establish the proper linkage 
for interface control and data entry within the frame- 
work of the language. 

Shocks 

A final example problem to be discussed is a shock- 
measurement system. In this system, two pressure 
transducers are mounted on a moving vehicle. An 
air-borne shock-wave is transmitted past these two 
transducers. The relative time of arrival of the shock- 
wave at each transducer and the length of shock dura- 
tion at each transducer is measured by a system which 
is attached to the vehicle. This portion of the system 
further converts this information for transmission over 
a telemetry link to a receiver. The typical input se- 
quences shown in Figure 6 represent possible received 
signals in this system. The time T 0 to T x represents 
the shock duration time on one transducer while the 
time T 2 to T 3 represents the shock duration time on the 
other transducer. The physical reasoning is not im- 
portant to our discussion, but the times of interest 
are the time differences To to Ti and T 0 to T 2 . In 
some instances, for example, the second typical input 
sequence, T 2 may precede T 0 . Notice that the two 
transducers modulate the signal differently so that it 
is always possible to identify T 0 as an amplitude in- 
crease of two units while T 2 results in an amplitude 
increase of one unit. Typical order of magnitude times 
for these events are To to about 200 to 400 /is and 
To to T 2 from about -300 to 800 /xs. It is deduced from 
other engineering calculations that a resolution of one 
microsecond would yield sufficient information in the 
measurement of these time durations. 

The received signal is fed to three comparators with 
three threshold voltages established. The comparators 
then yield decisions regarding the crossing of thresh- 
old level Va as a Boolean variable A, V B as variable 
B and V c as variable C. These inputs are further de- 
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Figure 6 — Shock measurement system — Front end 


Figure 7 — Shock measurement system — Description 


coded to produce the Boolean variables Xi, X 2 , X 3 
and X 4 which indicate respectively the number of 
thresholds which have been crossed. These signals and 
their logical decoding are all shown in Figure 6. The 
information of interest could be recovered if these 
X variables are fed to a sub-system which produces 
one megahertz output pulses on three lines called Zi, 
Z 2 , and Z 3 . The Zi output should then drive an up- 
counter which records T 0 to Ti time differences. The 
Z 2 output should drive the up-count line while Z 3 
drives the down-count line of an up-down-counter which 
records T x to T 2 time differences. This will yield the 
appropriate time differences in two’s complement 
binary arithmetic which is compatible with the com- 
puter. 

A natural solution of this problem is hence suggested 
as a clocked sequential system. A flow table for such a 
system is shown in Figure 7. (Note that this flow table 
must be interpreted differently from the previous flow 
table and is described in references such as Marcus. 8 ) 
The clock is not shown in the flow table, since its 
operation is understood. A state assignment is ex- 
ecuted and the excitation table, also shown in Figure 
7, is derived from this flow table. This generalized 
excitation table is of the type described in particular 
by Marcus. 8 From these tables the excitation and out- 
put function, shown in Figure 7 may be derived. A 
possible logic realization of these functions is shown 
in Figure 8. 

This sub-system does not complete the shock mea- 
suring system. The outputs Z if Z 2 , and Z 3 are fed to 
two counters or registers which* upon completion of 
an experiment, store the register contents in specified 
core locations by calling the computer data-break 
facility. The total experiment consists of observing 



several hundred of such shock waves which are gener- 
ated in bursts at a possible rate of some 6,000 shocks 
per minute. 

The support programming for this system was also 
executed by the students. In this instance, consider- 
able calculation must be applied to the data. It was 
felt that the FORTRAN language was an efficient 
vehicle for such calculations. The FORTRAN pro- 
gram must communicate with the interface and such 
programming problems must be solved by the stu- 
dents. 

What’s under way 

A large number of problems have been suggested 
for solution on this DADEC system. A listing of prob- 
lems which have been accepted and are in various 
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stages of progress is given below. It should be noted 
that these are undergraduate project problems and 
as such need not necessarily be new or spectacular in 
their implications. The sole requirement is that the 
problems have engineering application and will allow 
the student to follow a reasonable design procedure 
to achieve his goal. The problem areas under study 
include signal analysis using exponential basis func- 
tions, Lesbegue sampling, speech analysis and genera- 
tion, automatic x-ray data processing, on-line corre- 
lation analysis, physo-acoustic reverberation studies, 
graphic displays, and control of psychological experi- 
ments. 

Spin-off projects 

Several projects have developed which are not di- 
rectly related to the DADEC system but are inspired 
by it or find use and application in design with the 
system. One example is a Boolean string manipula- 
tion program which accepts long strings of Boolean 
expressions combined with a variety of operators 
(EXCLUSIVE-OR, AND, OR, NOT, STROKES, 
etc .). 6 The string manipulation program operates 
on this set of characters and yields a sum-of-products 
type expression for the Boolean function. Boolean 
simplification algorithms have also been developed. 
A family of programs that permit a high degree of 
operator-machine interaction have been developed 
for the manipulation of flow tables . 6 These programs 
are useful in flow table manipulations such as the 
elimination of superfluous states, or accomplishing 
appropriate mergers and they are helpful in solving 
the state assignment problem. 

CONCLUSIONS 

The system has been in use for about thirteen school 
months. Our classes are generally small; we graduate 
about thirty electrical engineers per year. The list 
of problems presented is perhaps a measure of the en- 
thusiasm with which students have accepted this 
problem area and DADEC system. The anticipated 
problem of problem suggestion is itself no longer a 


problem. We are now in the enviable position of being 
able to be discriminating in the suggestions which we 
allow to go to completion. The students are beginning 
to vie for time on the system and in order to qualify 
for this time they must present an acceptable technical 
proposal outlining their application. 

The present status of this DADEC system then is 
one in which a number of experiments have been 
developed in suppprt of a variety of course efforts. 
A tremendous possibility exists for future develop- 
ments of this sort. That is, the system configuration is 
sufficiently versatile so that only lack of the students 
imagination precludes his open-minded approach to 
a problem. It thus seems that this modest investment 
has sparked considerable interest and motivated the 
students to pursue the detail necessary to solve the 
problems of our new environment. 
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Evaluation of an interactive display 
system for teaching numerical analysis 


by P. OLIVER and F. P. BROOKS, JR. 


University of North Carolina 
Chapel Hill, North Carolina 


INTRODUCTION 

The purpose of this study was to develop, use, and 
evaluate an interactive display system for teaching 
selected topics 'in elementary numerical analysis. We 
were interested in giving students a thorough intuitive 
understanding of the pertinent mathematical functions 
and in measuring the learning effects of an on-line 
graphical capability. 

This system was developed in the spirit of the Culler- 
Fried on-line system. 1 It is similar to it in its emphasis 
on the combination of an interactive and a display 
capability, and its mathematical orientation; it differs 
from it in that it is designed primarily as a teaching 
tool rather than for problem solving. 

The system developed enables the instructor or 
student to enter a variety of mathematical equations 
into the computer in a FORTRANVlike format and 
obtain graphical displays of these functions. In ad- 
dition, the user can illustrate a number of elementary 
numerical methods, such as Newton’s method for 
locating roots of equations, the Euler-Heun method for 
solving ordinary differential equations, and the use of 
interpolating polynomials. The hardware consists of a 
display unit with lightpen and function keyboard and 
a background computer. The software consists of a 
monitor; programs which interpret requests from the 
display user; and programs which produce displays. 

A quantitative evaluation of the feasibility and use- 
fulness of computer graphic techniques in teaching 
elementary numerical analysis raises the following 
questions : 


1. Does the system developed perform a useful 
function? 

2. Does it perform this function better than 
currently available visual facilities, e.g., slides 
or film? Does it help the instructor to prepare 
more informative and interesting lectures? Does 
it give the instructor more flexibility in the 
classroom? Does it encourage the students to 
take a more active interest? Does it improve 
student retention? 

3. Can it be integrated into the teaching process 
so as to avoid being a distracting curiosity? 

4. What does it cost to teach with such a system, 
and how can it be economically feasible? 

5. What sort of computer system (software and 
hardware) is required? 

6. How much manpower, time, and money is 
required to develop such a system? 

Procedures 

A brief non-credit course in elementary numerical 
analysis was offered by the Department of Computer 
and Information Science in the summer of 1968. The 
course was held twice. One group was taught with the 
aid of the on-line graphic system; the other was taught 
conventionally. The class met for thirteen periods, two 
hours nightly. Prerequisites for this course were ele- 
mentary calculus and a familiarity with ordinary 
differential equations. 

The topics selected for use in the course and evalu- 
ation were 
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1 . Polynomial approximation and interpolation. 

2. Iterative methods of solving for the real roots 
of algebraic equations. 

3. Numerical solutions to ordinary differential 
equations. 

The system was used by the instructor to show 
examples during lectures and by the students in a 
laboratory session devoted to the properties of poly- 
nomials. 

The system had been tested qualitatively by similar 
use during its development. We learned at that time 
that hands-on time by students was useful in removing 
the novelty of the display unit, allowing the students 
to concentrate on the material illustrated. It was also 
found that presenting a series of illustrat ; ons concen- 
trating on a single topic, e.g., iterative methods to 
find roots of equations, was an effective way of impart- 
ing the key concepts of the material to the students. 

Example 

The use for lecture illustration can be seen from 
an example. The topic roots of equations was 
introduced with two specific examples from physics 
— a column-buckling problem and a pipe-flow 
problem. Each problem required solving for the 
real roots of an equation. 

Then there was a brief discussion of the tech- 
niques available for solving equations, and the 
field was narrowed to iterative methods. The 
properties common to all iterative methods were 
discussed, and the practical questions which face 
the problem solver, e.g., rate of convergence and 
computational efficiency, were presented. 

The first specific method, linear functional 
iteration with acceleration, was introduced by 
presenting the necessary theorems on the existence 
of solutions and convergence. 

This was followed by a series of illustrative 
examples. These consisted of polynomial and 
non-polynomial equations. The iterative method 
was applied to each and the regions and rates 
of convergence were discussed for each case. In 
applying functional iteration to the equation 

x 8 + 2x 2 + lOx -20 = 0, 

for example, the several ways in which the iterative 
scheme could be set up (e.g., 

x = 20/ (x 2 + 2x + 10), or x - (20 - 
2x 2 ,— x 3 )/10) 


and the effects on convergence were illustrated by 
actually displaying each of the cases. 

The Aitken acceleration scheme was then ap- 
plied to each of the cases previously illustrated, 
and its effects on non-converging as well as con- 
verging sequences of iterates were explored. 

Finally, a brief review of the techniques discussed 
and the key concepts discovered through the 
illustrative examples was given by the instructor. 

This cycle of introduction, presentation of theory, 
illustrative examples, and review was followed in each 
of the classroom lectures. 

Besides the lectures, each group was given a labora- 
tory exercise designed to lead the student to the im- 
portant properties of polynomials. The test group 
worked the exercises using the interactive display 
system. The students themselves operated the display 
device after receiving instructions on its use. The 
control group worked the exercises using the blackboard 
as a graphic device. 

An examination was given on each of the three topics, 
as well as a final comprehensive examination covering 
these three topics. Each group was given a one-hour 
examination (the pre-examination) during the first day 
of class. This examination tested mathematical ma- 
turity and previous knowledge of numerical analysis. 

Circumstances did not permit a random assignment 
of students to groups. Students attended the session 
of their choice. 

The course was open to anyone possessing the neces- 
sary prerequisites. Each group was composed largely 
of advanced graduate students with backgrounds in 
statistics, mathematics, and physics, and no previous 
experience in numerical analysis. In each group there 
was one non-student. These two non-students had 
college backgrounds (mathematics and physics) similar 
to those of the students, plus professional backgrounds. 

The test group was composed of four subjects; the 
control group consisted of six. Three additional subjects 
were available for measurements on the second topic, 
the roots of non-linear equations; two belonged to the 
first group, one to the second. These three subjects 
were given the same pretest as the others. 

Design of the experiment 

The experiment performed was of nonrandomized, 
control-group, pretest-posttest design. 

The two groups of observations were viewed as 
independent samples from a population composed of 
two normally distributed subpopulations. It was further 
assumed that each sample group was drawn from a 
distinct subpopulation, and that the subpopulation 
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variances were the same, and equal to the population 
variance. 

With these assumptions, the following tests were 
performed: 2 

1. A variance-ratio test for each of the post- 
examination results to determine the validity of 
the assumption of equal variances of the two 
groups. 

2. A multivariate F-test to determine if the differ- 
ence in performance of the two groups, taking 
the results of all four post-examinations into 
consideration, was due to chance or to the 
difference in treatments. The mean score of 
each group on the pre-examination was taken 
as the covariate, and the mean scores on the 
four post-examinations were the variables. 

3. A t-test on the within-classes regression co- 
efficient to determine if the difference in the 
initial ability of the two groups as measured by 
the pre-examination scores had a significant 
effect on the post-examination results. 

4. A univariate F-test for each of the four post- 
examinations to test the null hypothesis 

H 0 : mi < m 2 

versus its alternative 

Hi : mx > m 2 

where mi and m 2 are the mean scores of the 
test and control groups, respectively. The pre- 
examination mean for each group was used as 
a covariate. A significance level of .05 was 
chosen prior to performing the experiment. 

I nstrumentation 

Hardware 

The IBM 2250 Display Unit, Model 1, was used 
for this experiment. This unit is attached to an 
IBM System/360 Model 40H (256K bytes) com- 
puter via a selector channel. 

Images are generated by the 2250 on a cathode 
ray tube which has a display area of 12" X 12" 
in size, with 1024 by 1024 addressable points 3 . 
The following special features were available on 
the unit used for this experiment : 

An 8K byte buffer used for image regeneration. 
A character generator. 

Absolute vector graphics, which allows the 


plotting of vectors by specifying only the 
coordinates of the end points. 

An alphanumeric keyboard for entering charac- 
ters into the buffer. 

A function keyboard consisting of thirty-two 
pushbutton keys, an indicator light for each, 
and eight overlay code sensing switches. 

A lightpen. 

Programming system 

The graphic programming system used in this 
experiment operates under Operating System/360 
(MIT, Version 16). 

At Initial Program Load time a monitor module 
is loaded into a44K partition reserved specifically 
for graphics. This monitor brings the application 
program residing in the system linkage library into 
the graphic partition and transfers control to it. 

The graphic system is composed of seven load 
modules totaling approximately 5,500 S/360 as- 
sembly language instructions. No more than three 
load modules are ever in core at the same time. A 
dynamic overlay structure is used, so that at most 
35K bytes of memory are used at any one time. 
The multiprogramming environment in which the 
system operates allows the user to operate while 
batch processing and other tasks take place using 
other core partitions. 

The user has the following functions available 
to him : 

General Functions : 

Grid Display - 

The user defines his coordinate system by 
providing upper and lower bounds for the x 
and y axes, and increments (from the lower 
bounds of each axis) at which he desires 
vertical and horizontal lines to be displayed. 

Polynomial Display- 

Polynomials may be displayed by entering 
their coefficients or their real roots in the 
appropriate data area. Figure 1 displays the 
polynomial x 3 — x, and shows the grid para- 
meters along the margins of the display. 

Point Display- 

Up to fifteen points may be displayed by 
entering the (x,y) coordinates. 

Function Displays- 

Functions of one variable may be displayed 
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Figure 1 — Display of the polynomial x 3 — x 

by defining them in a PL/ll-ike format. 
Figure 2 is the display of the function tan 
(x) - x. 

Redraw Feature- 

All the polynomials in a current display, 
plus the most recently entered points and the 
most recently displayed non-polynomial func- 
tion may be redrawn on a new grid. 

Erase Feature- 

Any single vector or set of points may be 
erased from the screen via use of the lightpen. 
Numerical Analysis Teaching Function- 



Figure 2 — Display of the function tan(x) x 


The following numerical analysis techniques 
may be illustrated : 

Polynomial Interpolation 

Iterative Methods for Roots of Equations 

Linear Iteration 
Newton’s Method 
Secant Method 
Method of False Position 

Solution of Ordinary Differential Equa- 
tions 

Multipoint -Methods 
Predictor-corretor Methods 
Runge-Kutta Method 

Using the display system 

The system was designed as a teaching tool, not a 
problem-solving device, although it has been used as 
such. Ease of use, flexibility, and hardness— i.e., the 
capability of continued operation in the presence of 
disruptions such as invalid entries by users — were 
prime considerations in the system’s design. 

Ease of use is facilitated by use of the programmed 
function keyboard (PFK) as the sole source of “com- 
mands” from the user — this is in contrast with using a 
command language via the alphanumeric keyboard, 
which would require the user to learn the command 
syntax as well as more manual effort on his part. 
Each command is serviced by a subroutine. This 
modularity of program design makes it easy to add, 
delete, or modify sections of code. The calling sequence 
is uniform for all subroutines. 

The steps required to define a problem and illustrate 
its solution are designed to parallel those a student 
should perform if defining and solving the problem 
with pencil and paper. 

The following example illustrates this. The use of a 
single function keyboard will be considered an “in- 
struction,” and will be designated by naming the key. 
(Keys are labeled on the PFK overlay.) Setting of 
parameters on the designated screen locations will be 
indicated by writing the parameter name, followed by 
an equal sign, followed by its value. The meta-in- 
struction < initialize > indicates the setting of the 
screen dimension. In the example which follows the 
coordinates of the lower left-hand corner of the screen 
are (-5,-5), those of the upper right-hand corner 

( 5 , 5 ). 

The problem is to illustrate three iterations of 
Newton’s method to locate the real root of the equation 
x 3 __ x __ i — o, using x = 2 as an initial estimate of 
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the root. DATAPAD1 refers to a program-defined 
screen location used for entering parameters and 
functions. 

Figure 3 gives the program which will generate the 
desired display. Figures 4 — 6 represent the resulting 
display after each iteration. 

Thus, to illustrate the use of Newton’s method to 
locate the real root of the polynomial x 3 — x — 1 the 
user performs the following steps : 

1. Define the domain and range x 3 — x — 1 in 
which he is interested. This is done via the 
alphanumeric keyboard. 

2. Use a PFK key to display the desired coordinate 
system. 

3. Define and display the polynomial, entering its 
coefficient with the alphanumeric keyboard, and 
using a PFK key to enter this definition into 
main core and cause display. 

4. In a similar fashion, define and store the initial 
estimate of the root. 

5. Use a PFK key to illustrate each iteration. 

These actions are those the student or the instructor 
would ordinarily take in solving or illustrating the 
problem, and are taken in the same order. 

As a second example representative of the capa- 



Figure 4 — Illustration of newton’s method for finding 
the real root of x 3 — x — 1 =0, first iteration 


bilities of the programming system, we illustrate the 
use of Euler’s method for solving the differential 
equation 

y' = -2xy 

with initial condition 


Instructions 


Comments 


y = 1 at x = 0 


(initialize) 
DATAPADl = 
XP3 - X -1; 
STOREF 
PLOTF 

PLACE 

DATAPAD1 - 
3*XP2 - 1; 
STORED 
PLACE 

DATARADl = 2, 

DATA 

INIT 

NEWTON 

NEWTON 

NEWTON 


define function, x 3 — x — 1 
store definition 
interpret definition and plot 
function 

place cursor in DATAPADl 
area 

define derivative, 3x 2 — 1 
store derivative definition 
place cursor in DATAPADl 

area 

define initial estimate, 2 
store initial estimate 
identify stored value as 
initial estimate 
illustrate first iteration 
illustrate second iteration 
illustrate third iteration 


Figure 3 — Illustrative program 
Illustration of Newton’s method for finding the real 
root of x 8 — x — 1 =0 


The domain and range are 0<x<3,-1.5<y< 1.5. A 
step size of .3 will be used. The large step^size is chosen 
so as to emphasize the properties of the method. 



Figure 5 — Illustration of Newton's method for finding 
the real root of x 8 — x — 1 = 0, second iteration 
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Figure 6 — Illustration of Newton’s method for finding 
the real root of x 8 — x — 1, — 0, third iteration 



Figure 7 illustrates the approximate solution (the 
straight line segments) together with the true solu- 
tion y = e-x. 2 

Results of the tests 

The three subjects who participated only in the 
pre-examination and the roots of equations examination 
were not considered in performing the multivariate 
F-test, since the test requires that the number of subjects 
from a particular group be equal for each of the exam- 
inations considered. Their scores were used in all the 
other tests. 



IMUi.J.m. 


Figure 7 — Illustration of Euler’s method to approximate 
the solution of y' = — 2xy, y(0) *= 1 in the range 
0< x <3 


Figure 8 — Scatter diagram, Interpolation and 
approximation 

The variance-ratio test supports the hypothesis of 
equal variances for each of the four cases. 

The result of the multivariate F-test indicates that 
the total differences in performance of the two groups 
have only a 5.8 percent probability of being due to 
chance. It appears likely, therefore, that the treatment 
differences had a significant effect on the performance 



PRETEST 


Figure 9 — Scatter diagram, roots of equations 
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Figure 10— Scatter diagram, differential equations Figure 1 1 — Scatter diagram, final examination 


differences, taking all four examinations into 
consideration. 

There was significant correlation between the pretest 
and posttest, scores for only one of the four cases— the 
final examination. 

Figures 8-11 give the scatter diagrams for the four 
examinations. The scores on each post-examination are 
plotted versus the pre-examination scores. These 
diagrams show that the test group average scores 
improved steadily from test to test, while the control 
group performance fluctuated considerably. The differ- 
ence in the means for the post-examinations increased 
from test to test and was particularly large for the final 
examination. This seems to indicate that use of the 
graphic on-line system helped on retention, and that 
there was greater carry-over of learning from topic to 
topic on the part of the test group. The scatter diagrams 
also indicate greater correlation between pre- and 
post-examination scores for the test group. 

The univariate F-tests for each of the post-examina- 
tions show that the use of the graphic system made a 
significant difference for the roots of equations, differ- 
ential equations, and final examinations. 

The data does not indicate a significant difference in 
performance on the approximation and interpolation 
examination. One may conclude that there was no 
difference, or else that there is insufficient data to 
warrant a definite conclusion. The small sample size 
makes the test performed very weak. Reference to 
power curves shows there would be a probability of .6 of 


error if the hypothesis was accepted that the graphic 
system made no difference. 2 A definite conclusion cannot 
be reached from these data on the effects of the system 
for the topic of approximation methods. 

Validity of the results 

The data support the assumptions of normal distribu- 
tions and equal group variances. The possible effects of 
previous knowledge or experience in numerical analysis 
were controlled by the use of a pre-examination. Even 
so, these effects were small. The t-tests performed on the 
within-classes regression coefficients indicate that the 
adjustment made for pre-test scores did not affect any 
of the raw scores except those of the final examination. 

The intelligence of the subjects is the major uncon- 
trolled variable in this experiment. It was not possible 
to adjust for intelligence, because scores on a common 
measure of intelligence were not available. If the mem- 
bers of the test group were much brighter than those 
of the control group, the experimental data could be 
explained tliusly. Such a difference is doubtful in view 
of the similar backgrounds and educational levels of 
the two groups, and in view of the pretest scores. 

Would these results apply to other groups? We 
cannot tell for certain until the experiment has been 
repeated for groups of different backgrounds, scholastic 
levels, and motivation. There is no a priori reason 
to doubt that it can be extended. 
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In summary, the following conclusions can be made 
regarding the quantitative results of the experiment: 

1 . There is evidence to support the thesis that the 
graphic on-line system provides a useful and 
efficient aid in teaching numerical methods in 
roots of equations and differential equations. 
This effect is sufficient to be demonstrated even 
though weak tests were used. 

2. The graphic on-line capability has a positive 
effect on retention. 

3. Further experimentation with an improved 
system and a larger sample must be made in 
order to reach conclusive results for the topic 
of approximation. 

Qualitative observations 

Besides the numerical data, a number of observations 
can be made regarding the use of the graphic system as 
a result of the course conducted. 

1. Preparation time on the part of the instructor 
averaged about four hours per class hour — 
considerably longer than is generally required. 

2. Up to twenty-five percent more time is required 
to present an equivalent amount of material 
using the graphic system than when not using 
it. This time is used in setting up illustrative 
displays. 

This set-up time is distrarting to the student. 
Intermittent use of the graphic device during 
a class session is especially distracting. A good 
procedure is to introduce the material briefly, 
present the necessary theorems; give a series of 
examples illustrating the methods an algo- 
rithms; terminate the session with a brief sum- 
mary of the material. 

4. The amount of information displayed is im- 
portant — each display should illustrate at single 
principle rather than several. 

5. The ability to regenerate an entire display on 
a changed grid size proved very useful. The 
instructor can illustrate a particular problem 
in the large, and then enlarge a particular part 
to fill the entire screen. 

6. A system will fail at times. The instructor must 
be ready to continue the illustration in progress 
at the blackboard. He must be thoroughly 
familiar with the problems he is presenting. 
Whenever possible the instructor should en- 
courage the students to discover the point of a 
display. 


8. Hands-on time on the part of the students is 
very .useful. One problem of the final exami- 
nation consisted of determining the parameters 
a, b, and c in the polynomial form a(x + b) 2 
+ c so that the resulting polynomial would pass 
through three given points. 

The test group handled this with ease, and each 
individual was able to find the correct values 
and explain the steps taken to arrive at them. 
Most of the control group subjects were not 
successful, and those that were were not system- 
atic in their approach. The purpose of this 
exercise was not simply to find the coefficients. 
Rather, it was to illustrate the effects of varying 
the three parameters on the behavior of the 
polynomial. 

9. Class participation was much greater in the 
test group. The students in this group were 
eager to pursue topics which were not directly 
covered in the lectures. During the lecture on 
iterative methods for finding roots of equations 
the students in the test group discovered the 
effects of applying acceleration to diverging 
sequences of iterates, and did so by their own 
initiative. The test group also worked the ex- 
amination questions much faster than the con- 
trol group, usually starting by drawing a picture. 

Findings and conclusions 

Experience to date gives tentative answers to the 
questions initially posed : 

1. The results indicate that the interactive display 
system is a valuable and powerful aid in teaching 
selected topics in numerical analysis. 

2. The system performs this function better than 
visual facilities generally used. The graphic and 
the interactive capabilities enable the instructor 
to develop a large number of significant ex- 
amples to illustrate his classroom lectures and 
to make them more interesting. The interactive 
capability provides a flexibility not available 
through slides or filmstrips. Complete response to 
student questions stimulates student inquisitive- 
ness. Student retention is improved, and there 
is a greater carry-over of learning from topic 
to topic. 

3. The system can be effectively integrated into 
the teaching process, but delay time — the time 
necessary to generate new displays — and reli- 
ability are problems which require an unusual 
level of instructor preparation. 
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4. The cost of teaching with such a system is not 
high except for the cost of the display unit. 
Running the system requires very little pro- 
cessing time. Preparing class problems requires 
about five minutes of Model 40 CPU time per 
display hour. Classroom presentation averaged 
about two minutes of CPU time per display hour. 
The display unit is costly, but this application 
could use a simpler and cheaper display device. 
Both cost and reliability can be improved by 
using this system to prepare slides for class- 
room use, but extemporaneity and flexibility 
will be sacrificed. 

5. In determining the hardware and software 
capability required for such an interactive dis- 
play system, a number of items must be con- 
sidered. A 12" X 12" screen size is about average 
for display units with vector capability. A smal- 
ler screen size could be tolerated for individual 
use, but not for classroom use. The alpha- 
numeric keyboard is essential for entering data 
into the system, but the function keyboard 
could be eliminated. One could use the standard 
alternative of a menu of lightpen buttons dis- 
played on the screen. One could not readily 
substitute the alphanumeric keyboard for 
function buttons without seriously impairing ease 
of use. The 8K buffer used in this experiment 
could be reduced to 4K without impairing 
system efficiency. 

A graphic programming support such as the 
IBM Basic Programming Services is useful but 
not vital. The applications facilities required 
would depend on the use to be made of the 
system. Those used in this investigation were 
minimal though adequate for teaching the 
selected topics in numerical analysis. 

6. Development of the system described here 
required about 1200 man-hours, with one 


individual devoted to this task over a one-year 
period. Development also required about 163 
hours of S/360 Model 40 time. 

The results of this experiment indicate that use of 
an interactive display system can significantly increase 
the active role of the learner and improve student in- 
sight and understanding of elementary topics in nu- 
merical analysis. 

This is a pilot study. It demonstrates the usefulness 
of such a system only for one group of students with 
one particular subject-matter. To generalize, one would 
have to replicate this experiment with other groups of 
students. 

The study is, however, as useful for what it suggests 
as for what it proves. It suggests specific techniques 
for using such a system. It suggests that we measure 
the separate effect of student hands-on time. A con- 
trolled experiment should be run in which students 
use the graphic system to work a given set of problems, 
studying a set of notes presenting the necessary back- 
ground material. This treatment would not involve an 
instructor except as a monitor. 

Finally, the study suggests the desirable character- 
istics of follow-on systems and ways of making them 
more economical. 
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Computer based instruction in computer 
programming — A symbol manipulation- 
list processing approach 

by P. LORTON, JR. and J. SLIMICK 

Institute for Mathematical Studies in the Social Sciences 
Stanford, California 


INTRODUCTION 

Since February, 1969, a computer based course in 
computer programming has been running at an “inner 
city” high school in San Francisco, California. Each 
day ninety high school juniors and seniors in classes 
of fifteen interact with a course designed to teach the 
fundamentals of computer programming for business 
applications. For fifty minutes a day each student is 
on-line with a computer located thirty miles away on 
the Stanford University campus. The purpose of this 
paper is to describe the rationale and the major com- 
ponents of the software system used to implement the 
project. 

Lesson material and programming problems for 
the students are presented on teletypewriters linked 
via telephone lines to the Computer Based Laboratory 
of the Institute of Mathematical Studies in the Social 
Sciences on the Stanford University campus. In this 
laboratory are several computers which form a unique 
system for presenting instructional material. 

The main computer in the system for this project is 
a Digital Equipment Corporation model PDP-1D. 
The PDP-1D is a single address, 18 bit binary machine. 
The machine has 32,768 words of core memory of 
which 20,480 words are used by the time-sharing 
operating system. User programs are permitted up to 
12,288 words of core. The time-sharing system allows 
up to 26 users to run concurrently on the computer. 
This is made possible by the addition to the PDP-1D 
of a very high speed drum with 26 tracks, each capable 


of holding 4096 words. The time-sharing system swaps 
programs in and out of core memory very rapidly using 
a simple priority scheme based on “time-slicing.” 
Because of the necessity for user micro time-sharing 
the programs in this project occupy 10 of the 26 
available tracks. 

The PDP-1 communicates with the students at the 
high school through a smaller computer (DEC PDP-8) 
used to buffer text output. A PDP-8I has been installed 
at the school to perform a similar function at the other 
end of the line. Collins data sets were used in place of 
the PDP-8I during the first year. 

Aim and purpose of the course 

The main goal of this course is to present in very 
general terms the concept of a digital computer as a 
tool for solving business-related problems. As computers 
proliferate in business and industry there will be an 
increased demand for people who can see their jobs in 
terms amenable to computerized operation. Such tasks 
as filing and stockroom control, now available to 
minimally trained individuals, will soon require per- 
sonnel able to see and solve problems in terms under- 
standable to a computer. 

With the goal of training for applications on these 
kinds of problems, the need for something other than 
a “formula translation” approach is evident. Using 
filing and stock control as sample problem areas, an 
approach which stresses symbol-manipulation and 
list-processing suggests itself. Inventories can easily 
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be viewed as ordered pairs (a symbol-manipulation 
concept) of item names and counts. Retrieving infor- 
mation from a file can be thought of as a “tree search ’ 7 
(a list-processing concept). 

The advantages of teaching a symbol manipulation- 
list processing (abbreviated: SMLP) language are 
best shown in an analysis of the properties of SMLP 
languages. 

A. SMLP languages operate ; primarily on symbols 
and sets of symbols and, secondarily, on quan- 
tities. This implies that problems as concep- 
tually complex as text scanning become more 
manageable. Once text scanning becomes man- 
ageable, then many applications such as natural 
language-based information retrieval or dialogue 
systems for management information collapse 
into programmable problems. The power of an 
approach which emphasizes symbol manipu- 
lation is that conceptually difficult problems 
often become readily programmable. 

B. The list structure in SMLP languages provides 
an absolutely general form of data and program 
storage. A programmer, given a universal data 
storage facility, can give some attention to op- 
timization of the structure of his data. The 
optimization of data structure cannot be over 
emphasized since information retrieval (among 
other applications) is not! economically possible 
without structuring the data so that the comput- 
er answers efficiently the most frequently asked 
questions. 

C. SMLP languages teach the use of pointers and 
indices. While properly part of (B), the simplest 
definition of a pointer is that it is a quantity that 
specifies the location or existence of some other 
quantity; an index can be defined as a quantity 
specifying some base location. The concepts of 
pointer and index are useful in teaching the 
manipulation of data by using references rather 
than moving blocks of data from one place to 
another. An immediate example of an application 
of pointers is data sorting. 

D. SMLP languages allow pimple implementation 
of push-down stacks. While not of great in- 
trinsic value, push-down stacks simplify the 
calling and structure of subroutines, particularly 
recursive ones. 

E. SMLP languages simplify the treatment of name 
scope problems in a hierarchical store. A funda- 
mental concept of symbolic programming is 
that a quantity can have a name; furthermore, 
it may be desirable to limit the area of the pro- 


gram in which a given name refers to a particu- 
lar quantity. Thus, it is desirable to have a 
method of associating a given name to the rele- 
vant quantity on the basis of “area”; this as- 
sociation is referred to as “name-scope . 77 

In general, language possessing properties A-E pro- 
vide exceptionally general approaches to programming 
digital computers. It can also be pointed out that the 
COmmon Business Oriented Language (COBOL) re- 
sembles this kind of language more than it resembles a 
“formula translation 77 language. The general concepts 
available through an SMLP language would, it is be- 
lieved, be of considerable help to the students in their 
future efforts to build an understanding COBOL and 
related languages. 

Bask concepts 

Good computer programming, under the philosophy 
advanced here, depends on the understanding of cer- 
tain concepts not particularly oriented toward any 
one machine or language. The basic concepts which 
seem necessary for understanding the kind of appli- 
cations programming taught in this project seem to 
divide into concepts which are related to making a 
stored program machine work for the user and concepts 
which are related to what is felt to be the basic task 
of business applications programming: symbol manipu- 
lation-list processing. It is these concepts which form 
the basic content for this course. 

The first nine general concepts in the following list 
are of the first type. The tasks described are all asso- 
ciated with the how and why of making stored pro- 
gram machines do the work required of them. 

I. “Machine 77 related concepts : 

A. Stored Program. Refers to the ability to have 
a set of imperative actions implying some over- 
all task stored in a machine which can execute 
it in some sequential fashion. 

B. Stored Data. Refers to the ability of a ma- 
chine to store quantities like “stored program 77 
actions but not encompassing an overall mean- 
ing. 

C. Variable. Refers to the ability to name some 
part of the stored program and refer to the 
properties or value of this part through ref- 
erence to its name. 

D. Operations. Refers to the capabilities contained 
in the Central Processing Unit. Two main classes 
of operations are felt important: Arithmetic and 
N on-Arithmetic . 
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E. Addressing. Refers to the capability of pointing 
to various parts of the stored program as well 
as the ability to form data into clusters or ar- 
rays in some useful way. Three sub-concepts 
are felt noteworthy: Indexing, Base addressing, 
and Indirect addressing. 

F. Branching. Refers to the ability of a stored 
program to reorder the sequence of events it 
performs in completing a task. 

G. Loops. Refers to the ability to re-execute a 
subsequence of the stored program to complete 
a repetitive task. 

H. Blocks/Sub-Programs/Procedures. Refers with 
minor differences in emphasis to sub-groupings 
of the stored task which form semi-self con- 
tained programs often capable of being intro- 
duced into the main event sequence by being 
“called.” 

I. Input-Output. Refers to the machine's methods 
for listening and talking to the user. 

The following concepts are more directly related to 
the symbol manipulation-list processing approach to 
the problem space than they are to the problem of 
making a machine work. This does not mean that the 
concepts listed above are unrelated to issues associated 
with the nature of the problem space. Neither does it 
mean that a ymbol manipulation-list processing lan- 
guage is unsuited to presenting them. 

II . ‘ ‘ Language' ' related concepts : 

A. Data Handling. Refers to the method of viewing 
and manipulating the data a program is to han- 
dle. 

B. Recursion. Refers to a “self calling” ability of 
sub-blocks of the program in an SMLP type 
language. 

C. Arrays and Strings. Refers to a more general 
and efficient way of clustering stored data so 
that its manipulation becomes a simpler task. 

D. Data Structures. Refers to named functions 
which use indexing and pointers to locate ele- 
ments in the stored data. Examples might be 
“trees,” “lists”, “graphs”, etc. 

Languages selected for the project 

Given the conclusions on the advantages of teaching 
a “symbol manipulation-list processing” language and 
the fact that some machine level concepts might use- 
fully be introduced into the course, a language appro- 
priate to each conclusion was selected: a simple as- 


sembly language and a fundamental SMLP language. 
Each of these languages is briefly described below. 

Major components of the project 

The implementation of the conclusions reached in 
the preceding discussion involved developing three 
separate programs which, when loaded into the PDP- 
1-D, operate as the software system for this project. 
The three programs include a “driver” (SLAKER) to 
supervise the interaction of the student with the cur- 
riculum material and the language processors, an 
interactive assembly language processor (SIMPER)^ 
and an interpretive SMLP language processor (SLO- 
GO). Each of these parts of the software package is 
described below. Appendix A contains a sample lesson 
illustrating many of the components described below. 

Major component : SLAKER 

Introduction 

SLAKER [Slimick-Lorton All Knowing Educator 
Routine] is designed to provide the interface between 
the student at a teletypewriter and the curriculum 
material of the project. The over-riding concern in 
the development of this driver was to provide as much 
freedom and flexibility for each user as is consistent 
with service at reasonable intervals. 

If a student's program would cause a real machine 
to enter an infinite loop or write over his data, then 
this would happen to him in the instructional setting. 
Certain obvious restrictions have been placed on this 
goal. A student's work is not free to “clobber” other 
users (although this might well happen on a “real” 
machine), A student can wipe out his own effort and 
experience the pain of having to recover from the error. 

Functions 

The balance of the description of SLAKER is de- 
voted to the major functions it is designed to perform. 

1. Text Emission 

One of the major tasks SLAKER has is the presenta- 
tion of problems to the student at his teletypewriter. 
Several of the disc files attached to the driving program 
contain the curriculum material which is organized 
into four sequences of lessons and problems through 
which the student is to proceed. In addition to the 
lesson-text, the problem code contains certain values 
which indicate various subsections of the problem such 
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as the “correct answer” or the “hint,” as well as the 
problem type to the driver. 

The four strands into which lessons and problems are 
grouped for this project are: Lesson, Homework, Extra 
Credit, and Test. For problems in the Lesson strand, 
SLAKER is charged with waiting until the student 
enters the correct answer before going on to the next 
problem. With the other three strands, SLAKER 
presents the next problem as soon as any answer is 
entered. In every case the student is informed of the 
correctness of his answer. 

2. Response Evaluation 

After emitting the text for a problem to the user, 
SLAKER monitors his output, collecting it as an 
answer. When the user enters an “evaluate my work” 
request, SLAKER checks his answer according to the 
type of problem the student was given. 

A. Multiple Choice 

Under this format the answer is first compressed 
so that all duplicate characters are eliminated. 
Then the answer is searched for matches with 
the characters recorded as the correct answer. 
Up to twenty characters are collected from the 
student as possible answers for problems stated 
in this format. Only alphabetic characters are 
collected so that spaces, punctuation marks, 
or numbers can be inserted in the answers with- 
out affecting the correctness of the alphabetic 
string. 

B. Constructed Response 


fied as possible alternatives. If a match is found, 
then an appropriate comment is given and the 
student is told to try again, just as if he were 
wrong. At present this capability is available 
on constructed response problems and single 
choice-multiple choice problems. 

D. Programming Problems 

Evaluation of these problems is done by 
asking the student questions about his program 
after he wrote and debugged it with given data. 
This method of evaluation allows the student 
flexibility in programming a different solution 
than the solution the curriculum writers had 
in mind. 

3. Communication with Language Interpreters 

Since the main aim of the course is to provide rich 
and varied experience in programming, a main respon- 
sibility of SLAKER is readily to provide this con- 
tact. Each language differs slightly in how it wants to 
be told a student is using it but, basically, SLAKER’s 
role is to make the initial contact with the language 
processor, pass subsequent information to it and await 
the user’s indicated wish to return to the main program. 

4. Special requests from the Student Station 

The following activities can be requested from a 
student station. As a group they provide the student 
with considerable flexibility in how he proceeds through 
the course. 

A. Restart Station. Allows a user to request a 
station be restarted from the sign-on point. 
Used to correct improper sign-on efforts by 
students. 

B. Sign-off Station. Allows a user to terminate his 
lesson when he is ready. Part of the execution 
of this command involves storing where the 
student left off on his history file so that he may 
restart from this point on the following day. 

C. Go to Choice Point. Places the user at a point 
where one of the following choices can be made : 


When the student’s input is a response to 
this type of problem, all the characters he types, 
with the exception of carriage returns and line 
feeds, are collected. The checking routine then 
examines the response string looking for two 
kinds of characters: those that must be present 
and those designated as optional. The serach 
and match routine is of such generality that it 
is felt all possible correct answers will be marked 
correct if they are defined in the curriculum. 

C. Anticipated Alternative 

Although not a separate type of problem, 
this checking capacity is a separate skill of 
SLAKER. If alternative answers are expected 
they can be specified and checked for. If a cor- 
rect response is not found, then the answer 
evaluation routine checks the student’s effort, 
in the same fashion, against the strings speci- 


1. Return to Last Problem. Allows the stu- 
dent to continue working from w r here he 
last signed off in the strand he specifies. 

2. Go to Specific Lesson. Allows the student 
to begin working on the lesson number in 
the strand he indicates. 

3. Attach a Language Processor. Allows the 
student to call forth one of the language 
processors available in the course. 
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D. Skip Problem. In the Lesson strand, only a 
correct answer will advance a student on to the 
next problem. This feature allows a student 
to skip out of this loop. As the next problem is 
called, the correct answer to the skipped prob- 
lem is printed. 

E. Give Hint. Commands SLAKER to print the 
“hint” provided for the particular problem. 

F. Erase Answer. The user has the option of 
erasing all of the answers he has typed or merely 
the last character. Erasing the last character 
can be repeated until the entire answer is erased 
if wished. 

G. Communicate with Stanford Monitor. This 
feature allows student stations to type mes- 
sages to the monitor teletypewriter at Stanford. 
Usually, its use is reserved for the classroom 
teachers who may want to correct a lesson, enter 
a new student, or ask a question. As part of 
this feature it is also possible to communicate 
from the monitor teletypewriter to any of the 
student stations. 

Major component : SIMPER 

Introduction 

SIMPER [Simple Instructional Machine for the 
Purpose of Educational Research] represents an at- 
tempt to make available to the student at a teletype- 
writer a simple computer which he can program in a 
manner analogous to “assembly language program- 
ming” on digital computers of modest size. 

This instructional package can be most easily under- 
stood when viewed as consisting of two main parts: 
a machine (SIMPER) and an assembler (SASS). 
The latter is designed to generate the machine code 
for SIMPER. The “machine” is a mythical digital 
computer which can be described in a formal way and 
for which programs can be written. Although the ma- 
chine responds to 18 bit instructions in its “machine 
language,” there is no direct access to the machine via 
18 bit numbers. The purpose of the machine is to teach 
students to program so the machine is programmable 
only through a symbolic assembly language. 

The assembler generates code for SIMPER from 
Assembly Language instructions typed by the student. 
Assemblers generate code instruction by instruction. 
This one generates code for SIMPER immediately 
after each instruction is typed in by the student. This 
feature enables the student to receive immediate cor- 
rection for most syntax errors and, when the student 


avails himself of the option, each line of code can be 
checked immediately to assure the student that the 
assembler translated, the student’s instruction as he 
wished. 

The current version of SIMPER is designed to time 
share up to 15 students concurrently. The interpreter 
occupies 4096 words of PDP-1D core memory while 
the arrays representing the simulated machines for all 
15 possible users occupy an additional 4096 words of 
memory. 

Description of the SIMPER machine 

SIMPER is a fixed-point, single address machine 
with a memory of variable size (currently 128 words). 
Operations are performed in two general purpose 
registers. Instructions are six digits in length: two digit 
operation code, one digit register specification field, 
and a three digit address field. At present, 16 operations 
can be performed. 

The size of the machine’s memory is variable de- 
pending on the available space. For this project the 
memory size is 128 decimal (200 octal) locations. This 
size was chosen because it allows the fifteen students 
to run parallel in the space available on the PDP-1D 
and it also means the students’ daily programming ef- 
fort can be “saved” on a disk scratch file of convenient 
length, enabling the student to continue programming 
efforts from session to session. 

Operation of the SIMPER machine 

SIMPER runs by executing the six digit number it 
finds in the memory location pointed to by the pro- 
gram counter. The program counter is updated as part 
of the instruction-fetching activity. An instruction by 
instruction-execution of a program is printed on the 
Teletype. While thus being able to monitor the execu- 
tion of his program, hopefully, a student is given 
special insight into how each instruction operates and 
how a sequence of instructions can be converted into 
meaningful work. This “printing out” of the execution 
sequence also slows down the speed of execution so 
that the work of the machine is easily followed. The 
student can also watch the effects of “bugs” arise and 
develop into problems which require attention. This 
feature is intended to make the debugging of machine 
language programs an easier task. A special flag can 
be set at execution time to suspend this feature. Exe- 
cution speed is then improved by a factor of four. 
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The assembler 

Description 

The assembler recieves its instructions from a stu- 
dent through a teletypewriter keyboard. Each student 
interacting with the program is listened to for characters 
which are collected as an instruction to be assembled. 
Students are served by the assembler in a manner 
which both time shares and “oils the squeaky wheel 
first.” 

When the student is given a problem involving as- 
sembly language programming, he is told to sign on to 
SIMPER. He calls the choice point option and, in 
response to “Where to?— »”, types “SIMPER.” The 
student is then in contact with the assembler. He is 
informed that he may now write his program and 
columns labeled “LOC” and “INSTRUCTION” 
are created. In the LOC column the assembler prints 
the number of the memory location into which the 
instruction being written will be assembled. The as- 
sembler then awaits an instruction from the student. 
The student types his instruction and an indicator 
that he is finished. The assembler! immediately examines 
the text string and attempts to generate SIMPER 
executable code. If all is in order, programming ad- 
vances to the next memory location. If all is not in 
order, the assembler generates an appropriate error 
message. By assembling in real-time after each in- 
struction is entered, the assembler can give immediate 
feedback on syntax errors to the student. 

Major component: SLOGO 

SLOGO (Stanford LOGO) is the I.M.S.S.S. im- 
plementation of LOGO, a computer language developed 
by Wallade Feurzig and Seymour Papert of Bolt, 
Beranek, and Newman expressly for teaching the 
principles of computer programming. SLOGO is 
similar to LISP 1.5 in that both are left prefix lan- 
guages, both have a simple type of function definition, 
and both have similar sets of primitive operations. 
SLOGO functions, unlike LISP, have predefined 
numbers of arguments which* along with the left pre- 
fix notation, allow SLOGO to require minimal user 
punctuation. 

While SLOGO is an ideal symbol manipulation and 
string processing language, it has substantial weakness 
in not providing structures that are effectively lists of 
lists a la LISP 1.5. While generality is very desirable 
to the programmer, the choice of LISP 1.5 as the sym- 
bol manipulation-list processing language for this 
project posed such severe curriculum problems that 


the attempt to use it was abandoned; thus, SLOGO, 
which has less generality, was implemented instead. 

SLOGO currently time shares five concurrent users; 
each user has a 4096 word drum track that contains 
his own functions, execution stack, etc. SLOGO is a 
re-entrant program when executing commands from 
a user, but it is not re-entrant with respect to console 
input and the queuing apparatus. The currently avail- 
able functions with short definitions attached are listed 
in Appendix B. In the following sections, first, the 
basic data types used in SLOGO are described, and 
immediately thereafter is a discussion of the two 
processing modes of SLOGO. 

Data types in SLOGO 

There are three basic data types in SLOGO: word, 
sentence, and number. A brief explanation of each fol- 
lows. 

(1) A “word” consists of a string of letters, digits, 
or certain punctuation marks; punctuation marks 
that cannot be used are blank, single quote, “>”, 
“<”, and possibly others that depend on which 
version of SLOGO is being run. 

(2) A “sentence” consists of a group of words. Al- 
though one can argue that sentences could consist 
of one or more words, to avoid ambiguity we assume 
that sentences consist of two or more words. 

(3) A “number” consists of a string of decimal 
digits plus a leading minus sign, if the number is nega- 
tive. The largest number acceptable is ± 131,071. 

There are three methods of referring to data: func- 
tion values, pointer variables, and literals. A brief 
explanation of each follows. 

(1) A literal is a direct reference to the indicated 
data. Word and sentence literals are written with the 
single quote (0 surrounding the desired data. Literal 
numbers appear as the number itself, without quotes. 
A quoted pumber is assumbed to be a word. 

Example: 

The following are word literals : 

‘AARDVARK’ 

‘45’ 

‘3A' 

‘MIXTEC’ 

‘THISISAWORD’ 

The following are sentence literals : 

‘AARNOLD IS A APATHETIC AARDVARK’ 
‘ONTOGENY RECAPITULATES PHYLOGE- 
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NY’ 

‘12 3 45’ 

‘THIS IS A SENTENCE’ 

The following are number literals : 

1 

1776 

-10 

131071 

(2) Function values. Most of SLOGO’s built-in 
functions and all of the defined functions return a 
value. This value may be subsequently referenced by 
other functions, and the type of this function may be 
any of the three basic types. 

(3) Pointer variables are in reality name pairs, 
where one part of the pair is the name and the other 
part is the value. Names must have type values of 
either word or sentence but never number. The value 
type can be word, sentence, or number. Names are 
written inside closed symbols, which can be either “<” 
and “ > ” or “ — ” for left and right sides. 

Example: 

< ANTEATER > 

<NURNDY IS A GAME> 

POINTER 

<A> 

The peculiar literal ” is accepted by the read-in 
routines, can be generated internally, and is always 
printed by SLOGO as “NIL”. 

To illustrate the difference between literals and 
pointer variables, assume there is a name pair whose 
name is “HEROINE” and whose value is the sentence 
“OUR GAL SUNDAE.” 

The value, then, of < HEROINE > 

is OUR GAL SUNDAE. 

The value of ‘HEROINE’ 
is HEROINE. 

SLOGO processing modes 

SLOGO operates in two modes, command and defini- 
tion. There is a special character printed at the extreme 
left-hand end of the type line to indicate w r hich mode 
SLOGO is in. 

“Command” mode is indicated by a “>” (“greater 
than”) sign, and is the normal mode of operation. In 
command mode, as soon as a line of functions and 
arguments is typed in, terminating with a “.” (period), 
the line is converted to a Polish string of interpretive 


code and then interpreted by the SLOGO interpreter. 
Upon detection of an error or the successful execution 
of the Polish string of code, whatever output produced 
is printed (if PRINT is used) and SLOGO returns to 
a listen state while the next line is being typed in. 

“Definition” mode is indicated by a (“right 

arrow”) sign, and is the exceptional mode of operation. 
It is entered from command mode when an input line 
has been terminated with a period and begun with a 
“TO”. At that point definition mode is entered and 
cannot be left until the command “END” is entered. 
There is no attempt at function execution while in 
definition mode. The only use of definition mode is 
to define a SLOGO function by entering successive 
lines of functions and arguments. During definition 
mode, checking is done on the function names, validity 
of arguments, etc., but no functions are executed. 

SUMMARY 

The purpose of this paper has been to describe the 
software and corresponding rationale for a project 
designed to teach high school students how to use 
computers. The main thought behind the project is 
that, especially for business applications, an approach 
which stressed symbol manipulation and list proc- 
essing skills would very likely prove of long-term use 
to the students. 

To implement this course, a three-part software 
package has been developed which provides guided 
interaction for each student with important program- 
ming concents. The software package includes a 
“driver” to shepherd the student through the course 
material, an assembly language interpreter to provide 
him with an understanding of basic machine operation 
and a sj^mbol manipulation-list processing language 
interpreter to provide him with experience in solving 
problems in a suitable higher level language. 

It is worth noting that all of these programs are 
written in a subset of ALGOL-60, A course dedicated 
to the teaching of higher level computer languages could 
show the utility of such languages in no better way 
than to have its software packages written in such a 
language. One of the very useful demonstrations this 
project has made has been to show that complete, use- 
ful and efficient computer-based instruction systems 
can be written in a higher level language. 

Preliminary and informal results from the students 
in the course are quite encouraging and tend to sup- 
port the basic philosophy of this approach. There is 
every reason to believe that the future statistical 
analysis of the effects of this course will confirm these 
initial observations. 
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APPENDIX A 
Sample lessons 

(The following are short examples from the actual curriculum; they have been retyped. Comments 
within brackets are parenthetical I comments added to indicate various features.) 


3 JULY 1969 

SLAKER (VERSION OF 28 IVtAY 69) [sign-on] 

PLEASE TYPE YOUR NUMBER >11 

(CTRL G TO BEGIN-CTRL T TO RESTART) -► [start at Lesson 68] 


WHERE TO? -* 168 

LESSON 68: USING TESTS ! [a SLOGO lesson] 


WE CAN USE ‘FIRST,’ ‘BF’AND SO ON WITH ‘CALL’ 

IF YOU TYPE THIS: 

CALL FIRST OF BF OF ‘BEARS HIBERNATE IN WINTER’ ‘X’ 
IF WORD? < X > THEN P < X > . 

THEN SLOGO FINDS THAT < X > IS ‘H,’ WHICH IS A WORD, 

SO SLOGO REPLIES: j 

H 


FOR PROBLEMS 1-6, TYPE; WHAT SLOGO REPLIES 
TYPE ‘N’ IF NOTHING IS PRINTED. 

1. CALL FIRST OF ‘BLUE SKIES’ *W’ 

IF WORD? < W > TtlEN P < W >. 

-» BLUE- • -CORRECT 

2. CALL DIFF OF 9 AND 6 ‘X.’ 

IF NUMBER? < X >1 THEN P TIMES OF 4 AND 
DOES SLOGO THINK 3 IS A NUMBER? 

—* 12- • CORRECT 


[constructed response] 

< X >. 

[a hint] 

[another constructed response] 


NOW SIGN ON TO SLOGO AND DO PROBLEMS 7-10. 

AFTER ALL 4 PROBLEMS ARE DONE, TYPE CONTROL A. 

7. TEST TO SEE IF ‘PLACE KICK’ IS A SENTENCE. 

IF IT IS, PRINT ‘IS SIeN.’ 

8. TEST TO SEE IF 7 IS A WORD. IF IT IS, PRINT ‘IS WORD.’ 

9. TEST TO SEE IF ‘1 4 ! 8’ IS A NUMBER. IF IT IS, PRINT THE NUMBER. 

10. TEST TO SEE IF ‘P’ IS A WORD. IF IT IS, PRINT BF OF THE WORD. 


WHERE TO? -> SLOGO - • - OK [sign-on to SLOGO] 

SLOGO ■ • - THE ORIGINAL CONJURING CAT [hello from SLOGO] 

> IF WORD? ‘P’ THEN P BF ‘P.’ [“P” is “PRINT.” (“BF” is “BUT- 

= NIL FIRST”); this is solution to 10 above] 

> TO REVERSE < A >. , [Sample of SLOGO programming] 

-» IS < A > * .’ 

-» IF YES RETURN ‘ .’ ; 

-» RETURN WORD LAST < A > AND REVERSE OF BUTLAST < A >. 

-> END. 

> P REVERSE ‘1234567890.’ 

= 0987654321 

> • • • OK 


[return to SLAKER] 
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OUTPUT SHOULD BE: 

7. IS SEN [correct answers to 7-10; control 

8. NO OUTPUT S (“SKIP”) takes one on to 11] 

9. NO OUTPUT 

10. NIL 


LESSON 11: PROBLEM SOLVING [a SIMPER lesson] 

WRITE A SIMPER PROGRAM TO SOLVE EACH OF THESE PROBLEMS FOR YOU 
1. MARY BOUGHT 3 POUNDS OF CANDY AT 29 CENTS PER POUND. 

WHAT WAS HER BILL? 


WHERE TO? -» SIMPER - • - OK 
SIMPER (VERSION OF 6 JUN 69) 

BEGIN PROGRAMMING 
LOC INSTRUCTION 

000 -+ BEGIN 

001 -* GET X 

002 -*■ GET Y 

003 -> LOAD X 

004 -* MUL Y 

005 STOR X 

006 -> PUT X 

007 -» END 

008 -*• 

EXECUTE - • - STARTING LOC -* 0 AND ENDING LOC -*• 7 
PROGRAM EXECUTED ON 3 JULY 1969 


P C INSTR 

REG A 

REG B 

000 BEGN 

0 

32768 

INPUT -» 3 

INPUT -» 29 



000 LOAD 

3 

32768 

004 MUL 

87 

32768 

005 STOR 

87 

32768 

OUTPUT = 87 



007 END 

87 

32768 

• ■ - END OF EXECUTION, CONTINUE 


008 -*■ - --OK 

HER BILL WAS 87 CENTS. IF YOUR PROGRAM SAID 
OUTPUT = 87, SKIP ON. 


[go to SIMPER] 

[hello from SIMPER] 


[possible student solution to this problem 


[execution of solution] 


[back to SLAKER] 
[answer] 


— > [skip on] 

2. A RECTANGLE IS 8 INCHES LONG AND 4 INCHES WIDE. 

FIND ITS AREA. 

-♦ TO FIND THE AREA OF A RECTANGLE, MULTIPLY THE LENGTH [hint] 

TIMES THE WIDTH. 


WHERE TO? 


SIMPER- - OK 


[sign-on to SIMPER] 
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APPENDIX B 
Concise guide to SLOGO 
(Optional words are italic) . 


WORDS OF X AND Y 
SENTENCE OF X AND Y 
FIRST OF X 

BUTFIRST OF X 

LAST OF X 

BUTLAST OF X 

SUM OF X AND Y 
DIFFERENCE OF X AND Y 
TIMES OF X AND Y 
QUOTIENT OF X AND Y 
IS X Y 


IF YES THEN Si, when Si is some 
executable statement 
IF NO THEN Si 
IF WORD? OF X THEN Si 
IF SENTENCE ? OF X THEN Si 
IF NUMBER? OF X THEN Si 
TO NAME OF < X > AND < Y > 

RETURN X 
END 

GO TO LINE N 

CALL THING X NAME Y 

LOGO 

ERASE name 
TRACE 
UNTRACE 
PRINT X 


produces a word which is X concatenated with Y. 
produces a sentence of Y appended to X. 
if X is a word, result is the first letter; if X is a sentence, 
result is the first word. 

if X is a word, result is all but the first letter; if X is a 
sentence, result is all but the first word, 
if X is a word, result is the last character; if X is a 
sentence, result is the last word. 

if X is a word, result is all but the last character; if X is 
a sentence, result is all but the last word. 

X + Y 
X - Y 
X® Y 
X -f- Y 

sets internal flag to true if X = Y (equality of arguments 
for numbers; character by character equality of words; 
word by word equality of sentences) ; false otherwise, 
execute Si if internal flag is true; ignore Si if false. 

execute Si if internal flag is false; ignore Si if true, 
executes Si if X is a word, 
executes Si if X is a sentence, 
executes Si if X is a number. 

begins definition of a function named 1 k name” and whose 

formal parameters are X and Y. 

exit from current function with value X. 

complete definition of function and insert RETURN 1 ’ 

in the code for safety’s sake. 

branching statement to be used inside of user-defined 
functions. 

associates the name produced by evaluating Y with the 

value produced by evaluating X. 

reset. 

erase the function named “name.” 
turn on trace for all user-defined functions, 
turn off the trace. 

print the value of X on the user’s teletype. 
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INTRODUCTION 

Any input device used in conjunction with a computer 
controlled display for interactive information ex- 
change between man and computer must function 
as a position encoder. Input devices for handling two 
dimensional positional information can be grouped 
into two general types, one type encoding absolute 
positions and the other encoding changes in position. 

Devices accepting absolute positions rely on a direct 
mapping of positions from an input surface to a dis- 
play surface. The input surface is usually a flat plate 
or tablet on which positions are indicated with a mova- 
ble hand held stylus. One consideration in developing 
a device of this type is the location of the input sur- 
face with respect to the display surface. The mapping 
relationship between surfaces is simplified for the user 
to the extent of being instinctive if the two surfaces 
are coincident. If the input surface is superimposed on 
the display surface with a finite separation, the user 
has to cope with the problem of parallax. A transparent 
input surface and a one to one mapping scale are im- 
plicit in these two arrangements. A third possibility 
is that the two surfaces are in different physical loca- 
tions. This makes it necessary for the user to rely on a 
visual feedback process by observing the mapping of 
his selected position in relation to the desired posi- 
tion and then modifying his selection to decrease the 
difference. 

The stylus used for indicating positions on the sur- 
face is typically an active one which contains a signal 
sensor, as for example, in the RAND Tablet, 1 or a 
signal radiator, as in a magnetically coupled device 


described by Lewin. 2 The stylus must be large enough 
to accommodate the necessary components, and, in 
addition, present devices require a cable connecting 
the stylus to the console for signal transmission, his 
makes some active styli difficult to use with dexterity. 

Input devices for encoding position increments do 
not have separate input surfaces, and their operation 
depends entirely on visual feedback from the display 
surface. This type of device consists of a mechanical 
assembly having at least two degrees of freedom, such 
as a joy-stick or track-ball, which can be manipulated 
to indicate changes in the position of a cursor displayed 
on the screen. 

Touch sensitive overlay 

Work on the device described in this paper began 
with several primary objectives which are related to 
the considerations outlined above. These objectives 
are: 

1. The device must encode absolute positions 
indicated by the user. 

2. The input surface must be as close as possible 
to the display surface. 

3. Positions are to be indicated with a passive 
stylus, including a human finger. 

The first two objectives ensure that the relation- 
ships between the positional information that the 
user must provide and the information he observes 
on the screen are fundamental ones. This reduces 
the time and mental effort expended, especially when 
the device is used for item selection, that is the selec- 
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tion of a sub-set from a set of items shown on the dis- 
play surface. 

Assuming that the first two objectives are met, the 
third allows one to select items or positions on the 
screen merely by pointing at them with a finger. Be- 
cause pointing with a finger is; man's most natural 
method of indicating selection* a touch activated de- 
vice creates a minimum of distraction for the user. 
In fact, an ideal implementation of the three objec- 
tives listed above would result in an input device 
that was apparent to the user in function rather than 
in substance. 

Admittedly* the human finger is a rather coarse 
stylus but the resolution attainable is sufficient for 
many types of manual information entry. The words 
or phrases displayed for selection in an information 
retrieval system could be in a format suitable for this 
type of input technique. If a conventional keyboard 
is used in conjunction with the display terminal, a 
touch activated display overlay reduces the time spent 
in going from keyboard to display by eliminating the 
intermediate step of picking up a stylus. In addition, 
a portion of the display screen could be used as a 
touch sensitive keyboard with dynamic computer 
control of the associated key functions. The apparent 
simplicity, both physically and functionally, of this 
type of input device is a significant advantage if the 
user is a young child communicating with a computer- 
assisted instruction system. 

For information entry requiring more resolution 
than one can obtain with a finger, a suitable passive 
stylus could resemble an ordinary pencil with its con- 
venient size, light weight, and freedom of movement. 

One touch sensitive device 3 that has been developed 
for use with a CRT consists of a number of wires 
terminating at the front surface of the display tube. 
Each wire forms the arm of an AC bridge which is 
unbalanced by body capacitance. A second device, 
developed by Control Data Corporation, has a series 
of translucent, touch-activated strips in front of a 
CRT display. 

The approach taken in our case was to use an echo 
ranging technique with elastic surface waves. Echo 
ranging with pulsed ultrasonic surface waves has been 
applied successfully for a number of years in the field 
of flaw detection for structural materials. The propa- 
gation delay of ultrasonic elastic waves has been used 
as the basis for graphic input devices for a computer. 
However, these devices do not employ echo ranging 
and consist basically of fixed sources or radiators with 
the sensor in a movable stylus. One of these, developed 
by Woo at IBM, 4 also uses surface waves on a glass 


plate. The Lincoln Wand 5 provides a three dimensional 
input capability by using ultrasonic waves propagating 
in air. 

In the device developed at NRC, the radiator and 
sensor are physically the same piezoelectric transducer 
which is electrically switched between the driving 
circuitry and the echo receiving circuitry^ Pulse modu- 
lated surface waves are produced on a transparent 
glass plate, and any object contacting the surface 
reflects some of the wave energy back to the source. 
The distance from the radiator/sensor to the target 
is proportional to the time between the radiator pulse 
and the reception of the echo pulse. 

Surface wave characteristics 

An elastic surface wave can be represented mathe- 
matically as a combination of inhomogeneous longi- 
tudinal and transverse waves. This is exemplified by 
the particle displacements for a surface wave. The 
particles describe elliptical orbits with the major axis 
perpendicular to the surface and the minor axis parallel 
to the direction of propagation* corresponding to the 
transverse and longitudinal components respectively. 

The particle displacements decrease exponentially 
with depth into the material* the depth decay factor 
being a function of the wavelength and the material. 
For glass, the wave energy at a depth of one wave- 
length is only about three percent of its value at the 
surface. A practical implication of this result is that, 
to a close approximation* a plate several wavelengths 
thick appears as the solid half-space necessary for 
true surface wave propagation. 

Waves on the free surface of a solid half-space, which 
are also known as Rayleigh waves, are not dispersive 
and their phase velocity depends only on the properties 
of the material on which they are propagating. For 
plate glass the velocity is 10,400 ft/sec. 

The amplitude of all elastic waves decreases with 
distance from the source through three mechanisms — 
beam divergence, scattering, and absorption. Because 
a surface wave is essentially a two-dimensional phenom- 
enon, the decrease in amplitude due to beam diver- 
gence is proportional to l/\/ r > compared tol/rfor 
spatial waves, where r is the distance from the source. 
The attenuation due to scattering and absorption is 
related to that of spatial waves, with the attenuation 
factor being approximately proportional to frequency 
in the ultrasonic range. The attenuation coefficient of 
plate glass measured at 8 MHz is 0.40 nepers/inch. 

An interesting property of surface waves is their 
ability to propagate along curved surfaces. If the ra- 
dius of curvature is large with respect to the wave- 
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length, there is only a slight change in attenuation and 
velocity. This property makes it possible to employ 
the echo ranging principle described to produce a de- 
vice which uses the curved front face of a CRT as the 
input surface, reducing parallax to a practical minimum. 

Echo ranging 'parameters 

All systems using echo ranging for target location 
have similar design parameters. Although considerable 
effort has gone into the refinement of echo ranging 
techniques for radar and sonar, the additional com- 
plexity and cost of such developments as signal cor- 
relation makes them impractical for this application. 

For two dimensional space* the stylus location can 
be determined by measuring its distance from two 
fixed points or its normal distance from two fixed 
lines. The latter method was chosen and implemented 
by alternately scanning the surface in orthogonal 
directions using linear transducer arrays fixed at the 
edges of a square plate. This method can provide the 
stylus location directly in terms of x-y coordinates 
without additional computation. The line reference 
method also avoids the problem of edge reflections 
obscuring valid echoes. Furthermore, with the large 
beamwidths needed in the first method, it is difficult 
to achieve an adequate surface wave power density 
at frequencies in the megahertz range. 

The choice of plate material was limited by the 
transparency requirement. Ordinary plate glass was 
found to be satisfactory although its attenuation coef- 
ficient is higher than that of fused quartz and some 
optical glass. All the glass tested had several surface 
flaws per square foot but most of these were shallow 
enough to be eliminated by localized hand grinding 
and polishing. The plate size was chosen to provide a 
usable surface of 10 X 10 inches. 

Factors involved in the choice of carrier frequency 
include the positional resolution, the surface wave 
attenuation, the radiator beamwidth, the gain in 
reflected energy for a given target size, and the availa- 
bility of piezoelectric transducers. A carrier frequency 
of 8 MHz was chosen for the initial device with the 
corresponding wavelength on glass being about 
0.015 inch. 

Radiator/ sensor development 

One of the most efficient and convenient ways of 
generating surface waves at frequencies in the low 
megahertz range is by the mode conversion of a longi- 
tudinal spatial wave. This occurs when a longitudinal 
wave is incident on an interface between two solid 



materials with an angle of incidence large enough 
that total internal reflection occurs, and no energy is 
refracted into the second material. In order that the 
boundary conditions remain satisfied at the interface 
for this case* inhomogeneous longitudinal and trans- 
verse waves are produced in the second material. In 
ther words, a surface wave is generated. 

A practical implementation of this, shown in Figure 
1, consists of a thickness mode piezoelectric trans- 
ducer mechanically coupled to a solid prism. Max- 
imum surface wave output occurs for a prism angle, an, 
such that the spatial period of the surface pertur- 
bations corresponds to the wavelength of the resultant 
surface waves at the frequency of the incident wave. 
That is, when 

c L = c 8 sinai 

where c L is the longitudinal wave velocity in the 
prism, 

and c 8 is the surface wave velocity. 

For this optimum angle to be real, the prism material 
must be chosen so that c L < c 3 . One of the commonly 
available materials that meets this velocity require- 
ment for generating surface waves on glass is an acrylic 
resin such as Plexiglass or Lucites. 

The same configuration also makes an efficient sur- 
face wave sensor. In this case, incident surface waves 
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excite spatial waves in the prism with an angle of 
propagation determined by the velocity ratio. When the 
same transducer is used for both sending and receiving, 
the energy that was internally reflected within the 
prism during the send interval! appears as clutter or 
noise during the receive interval. Although this excess 
energy is gradually absorbed by the prism material, 
its effect can be reduced by modifying the prism shape 
and coating it with an absorbent material, For the 
transducers actually constructed, the first two inches 
of range could not be used because of the clutter. 

The piezoelectric transducers are made of a lead 
zirconate-lead titanate ceramic having a thickness 
mode electro-mechanical coupling coefficient of 0.66. 
This material is relatively good for energy transforma- 
tion in both directions. The bandwidth and mechanical 
output power of a piezoelectric transducer are related 
to the mechanical impedance of the materials to which 
it is coupled. After some experimentation with quarter 
wave impedance matching transformers and various 
backing materials, it was decided to sacrifice band- 



Figure 2 — Parallel impedance components for a series 
connected array of four 1/2X3 ,/4 inch transducers 


width for sensitivity by using air-backed transducers 
bonded directly to the prism. The result was a radiator 
fractional bandwidth of 20 percent. The parallel com- 
ponents of the electrical input impedance for a small 
test array constructed in this way are shown in Figure 
2 . 

For an 8 MHz pulse modulated signal with a 1.6 
MHz bandwidth, the minimum resolvable stylus 
movement should be about 0,04 inch. As will be ex- 
plained later, this resolution was attained but unusable 
in the first device constructed. 

Array design 

The method of target location being used requires 
a line source of waves having uniform amplitude and 
phase across a ten inch width. To combine separate 
radiator elements into a linear array with the desired 
characteristics, the radiation pattern of individual 
elements must be known. An expression for the direc- 
tivity characteristics of a prism type of radiator has 
been derived, 6 and it yields results similar to the sin 
x/x function for spatial radiators. Figure 3 compares 
values computed for an 8 MHz radiator using this ex- 
pression with experimentally measured values. 

For practical plate dimensions and transducer sizes, 
the usable surface area lies in the far-field region of the 
individual elements but in the near-field region of the 
overall array. By computing the response for various 
linear array configurations, a radiator width of 0.465 
inch, and a spacing of 0.565 inch, were selected. 

After the arrays were assembled and tested, the 
measured radiation pattern was more irregular than the 
computations indicated. This discrepancy was at- 
tributed to the variation is spacing, orientation, and 
bond characteristics due to assembly tolerances and 
the variations in transducer sensitivity, The gaps in 
the pattern were sufficiently large and numerous that 
it was necessary to add a second set of arrays on the 
opposite sides of the plate. These are offset with respect 
to the first so that the beams from opposite arrays are 
effectively interleaved. The arrays are energized se- 
quentially to avoid mutual interference. 

The maximum two-way propagation time for a ten 
inch usable surface and a two inch buffer zone is about 
200 /zsec. Therefore, even with four separate arrays, 
the sampling rate can be greater than 1 KHz, which 
is more than adequate to follow normal stylus motion. 

Electronic circuitry 

The signal processing circuitry consists of a radiator 
driver, an electronic switch, and an echo receiver. The 
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Figure 3 — Directivity pattern for a surface wave radiator 
at 8 MHz with 0.23 inch width 



TO 

computer 


Figure 4 — Position encoder block schematic 


timing circuitry digitizes the signal propagation time, 
and the control logic maintains the correct operating 
sequence. Figure 4 shows how these components are 
interconnected. 

The radiator driver and the arrays are matched to 
50 ohms allowing them to be connected with standard 
coaxial cable. The diode switch, with a four-pole 
double : throw action, permits the four arrays to be 
multiplexed into a single driver and receiver, and it 
also isolates the receiver during the driver pulse. The 
echo receiver consists of an RF amplifier followed by 
a demodulator and a threshold detector. The receiver 
gain is electronically swept during each scan to com- 
pensate for the signal attenuation with range. A range 
gate rejects echoes originating outside of the designated 
area. Figure 5 shows the demodulator and threshold 
detector outputs for a single scan. The signal at the 
centeris the echo from a finger touching the glass. 

Echo timing is performed by a free running counter. 
Both up and down counting are required to digitize 
scans originating at opposite sides of the input surface. 
The coordinate grid is considered to have X and Y 
axes coincident with the edges of the usable surface, 
the origin being in the lower left corner. Adjustments 
on the range gates and counting circuitry allow the 
size and position of the coordinate grid to be varied 
slightly to permit registration with the grid of an as- 
sociated display device. 

The control circuitry allows two modes of operation: 
a continuous mode and a discrete mode. In the con- 
tinuous mode, a Data Ready pulse signals the comput- 



Figure 5 — Echo receiver response 
Vertical: Upper 0.5 v/div.. Lower 5.0 v/div. 
Horizontal: 25 jisec/div. 
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er for every set of coordinates generated while stylus 
contact is maintained. In the discrete mode, on the 
other hand, only the location of the initial contact is 
transferred to the computer. The stylus must be lifted 
and repositioned to initiate another data transfer. 
The discrete mode significantly reduces the amount of 
data that must be handled without degrading the 
response time when the device is being used for item 
selection purposes only. 

In applications such as CAI which require a cluster 
of computer terminals in one location, it becomes feasi- 
ble to time-share the electronic circuitry among several 
terminals, thereby decreasing cost per unit. 

Device performance 

The complete device is shown in Figure 6 with a 
static display card behind the glass for demonstration 
purposes. It has been interfaced with a Digital Equip- 
ment Corporation PDP-8 computer for testing and 
evaluation. 

Tests have shown that stylus jnovements of 0.04 
inch could be resolved, which corresponds to the cal- 
culated value mentioned earlier. However, it was found 
that a contact area approximately 34 inch in diameter 
is necessary to ensure operation anywhere on the 
10 X 10 inch surface. The contact area must be as 
large as that to bridge the regions of low sensitivity 
which result from the irregularities in the surface wave 
radiation pattern. This means that even though the 
device has an inherent positional resolution of 0.04 
inch, the usable working resolution is considerably 
lower. 

When using the device with a finger, a pressure of 



only a few ounces is adequate for operation over most 
of the surface. In a few places, the pressure has to be 
increased to enlarge the contact area sufficiently. In 
the former case, pointing with a finger to items dis- 
played behind a seemingly ordinary glass plate is 
quite natural, and, except for the parallax, a person 
can make use of the device without consciously being 
aware of its presence . 

The position encoding is accurate and linear to about 
0.5 percent. This figure takes into account the variation 
in wave velocity due to temperature change and ma- 
terial inhomogeneity, nonlinearity of the radiated 
wavefront, and the stability of the timing circuits. 

Because scratches and marks on the glass produce 
small echoes which contribute to the background 
noise level in the receiver, some care must be used to 
keep the surface clean. The accumulation of finger- 
prints on the glass also contributes to the background 
noise. However, this is not a serious problem when the 
device is used with reasonably clean hands. 

The initial device as described has served to demon- 
strate the feasibility of using surface wave echo ranging 
as the basis for a touch-sensitive position encoder. 
The experience gained in constructing and testing 
the device has been useful in determining where im- 
provements are needed and how they should be im- 
plemented. Further computations indicate that a more 
sophisticated approach to the array design and as- 
sembly should improve the radiation pattern uni- 
formity and thereby reduce the present disparity 
between the minimum contact size and the inherent- 
resolution. Tests have been shown that lowering the 
carrier frequency to about 4 MHz should increase 
the signal-to-noise ratio of usable stylus echoes by de- 
creasing the signal attenuation and lowering the sen- 
sitivity to surface contamination. The overall conse- 
quences of these changes will be to improve the per- 
formance with medium and low resolution styli and 
also to simplify the circuitry, and hence reduce the 
cost, by using two arrays instead of four. Work is 
progressing on the construction of a device which in- 
corporates the improvements described. 
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A queueing model for scan conversion 


by T. W. GAY, JR. 
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Kingston, New York 


STATEMENT OF PURPOSE AND EXPECTED 
RESULTS 

The purpose of this paper is to present a queueing 
model for analyzing a video scan converter (VSC). 
The system analyst constantly strives for quicker 
methods, parallel approaches, and more accurate re- 
sults. Queueing theory is generally useful in the first 
and second of these categories. How then does the 
analyst develop a queueing model of a VSC in the 
hardware development and design stage? 

The model is constructed through study of the in- 
ternal functioning of the VSC and a queueing model 
is then developed which functions analogously with it. 
The queueing model developed for the VSC was an 
extension and adaptation of the known queueing model 
called ‘Hhe machine interference queueing model.” 
(See first section for an explanation) . 

The general machine interference queueing model was 
extended and modified to permit the servicing of 
multicharacter conversions in lieu of single character 
conversions. 

The results are presented in the first two sections of 
this paper. 

INTRODUCTION AND EXPLANATION OF THE 
VIDEO SCAN CONVERTER 

Queueing analysis is a recent branch of probability 
theory which studies the characteristics and effects 
of congestion in systems subject to random flows. The 
system under study may be a supermarket, a busy 
airport, or a real-time message processor. Ideally, 
the behavior of each of these systems could be repre- 
sented in mathematical terms, the common elements 


identified, and the appropriate analysis applied to 
determine the expected effects of various modes of 
operation. Practically, however, no such extensive 
analysis can be carried out. This is due in part to the 
lack of complete knowledge of the system specifica- 
tions at the time analysis is required. But more im- 
portant is the present limitation of the mathematics. 

The function of the video (analog) scan converter is 
to effect the conversion of characters which have been 
generated by a computer in directed beam format into a 
video scan format. In fulfilling this function, a video 
scan converter ordinarily “paints” character(s) on the 
face of a cathode ray tube and “converts” their image 
by scanning the image with a Vidicon. The directed 
beam character appears to be painted on with no 
presence of dots (or scanning lines) . The painted image 
is converted to a video scan character and is composed 
of horizontal dots conforming to the character shape. 
The smooth painted character has now become a 
configuration of dots close enough together so that the 
eye perceives an entire character (s) . 

One video scan converter is normally used to service 
a group of video displays. If a keyboard is attached 
to a video display, then the operator can enter key- 
strokes thru the keyboard into the computer. The key- 
stroke (s) are converted to the video scan format and 
appear on the operator’s display screen. If characters 
appear on the display, one by one, this is called “single 
character conversion”, a subject not discussed in this 
paper. However, characters frequently appear on the 
display screen in groups due to (batches) because of 
high traffic. This paper assesses this multiple character 
conversion phenomenon. 
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A queueing model for the video scan converter 

Explanation of the general case “machine inter- 
ference” queueing model, with development of as- 
sociated equations. (Reference: Feller, W., An In- 
troduction to Probability Theory and its Applications, 
2nd edition, New York: J. Wiley and Sons, 1957, 
Pages 416-418.) 

The machine interference model is a general class of 
queuing models. We are here specifically interested in 
the “Machine Servicing With Single Serviceman.” 

This model has a finite number of customers arriving 
randomly at a single server. It was originally applied 
in Swedish industry to determine how many machines 
(customers) one setup man (server) could tend without 
undue waiting delays resulting from several machines 
requiring service at the same time. 

Assume there are “m” identical machines assigned 
to one serviceman. Each machine is in one of two 
states. 

1. “up” (operating) 

2. “down” (requiring service) 

When a machine goes “down”, it joins the queue for 
the serviceman. If the serviceman if free, he im- 
mediately begins to service the machine. If he is busy, 
the machine must wait for service. The queue (waiting 
line) is organized on a “first-in, first-out” basis. The 
design is shown in Figure 2. 

To obtain the only readily available solution, the 
following assumptions are made : 

1. Service Time for all machines is expotentially 
distributed with mean time, “T fi ”. 

2. The “up” time for each machine expotentially 
distributed with mean time, “T 0 ”. 

These assumptions result in worst-case answers if 



Figure 1 — Poisson ratio function vs. service ratio 


the actual service and “up” time distributions are 
more regular. 

Since there is a “fixed” number of customers, we 
can readily see that the arrival rate of the machines 
to the service queue is proportional to the number still 
operating. If all machines are in the queue, the arrival 
rate is reduced to zero. Because of this “captive audi- 
ence” characteristic, the system has a built-in limiting 
effect and cannot become unstable (no infinite number 
of customers in the queue). For a relatively efficient 
machine, the mean operating time, T a , is comparatively 
large compared to mean servicing time, T s . The ratio 
of these two values is termed here the “service ratio”, 
z. 



If P* denotes the probability that K machines are 
“down”, then let P 0 denote the probability that all 
machines are operating and the serviceman is idle. No 
machines are in the waiting line nor being serviced . 
P 0 is the probability which represents the fraction of 
time the serviceman is idle. Thus, 1 — P 0 is the frac- 
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Figure 2 — Model of machine servicing with single 
serviceman 
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tion of time the serviceman is busy, and can be called 
the server utilization. 

Hence : 

l(m— k) \ 

(m- k): ) 

P* = (2A) 



Where P* is the probability that k machines are 
“down” Equation 2 A is the ratio of two Poisson ex- 
pressions, both obtainable from Poisson tables, and 
is known as the truncated Poisson distribution. If 
K = 0, then Equation 2A would give the probability 
of no machines in the service queue. If 1 — P 0 is server 
utilization, then substituting k = 0 into Equation 2A 
and subtracting it from 1 gives : 


The mean rate of machine breakdown is 1/T&. Since 
there are m machines, the total mean rate, of machine 
breakdowns entering the service queue is m/T 6 . Each 
breakdown requires a service time T s . Therefore, the 
server utilization, r m (z), must be: 

mT fi 

r mGs) = (4) 

l b 

But if T m (z) is already known thru use of Figure 1, then 
Equation 4 can be solved for T 6 : 

m T s 

T b — ; where T 6 is mean time (5) 

r m (z) between breakdowns. 

By further examination of Equation 3, it can be seen 
that the mean time a machine stays in the “down” 
state is: 


Server Utilization = 


(1 — Po) — 1 — 



(2B) 


For convenience, this function has been plotted in 
Figure 1, “Poisson Ratio Function versus Service 
Ratio.” Given T a and T*, z can be calculated using 
Equation 1. Given m, the number of individual queues, 
r m (z) can be found at the intersection of z and m and 
its value read from the “y” axis, Figure 1. Note that 
T m (z) denotes server utilization. 

For each machine, a breakdown is followed by a 
wait for service, a service time, and an operating time 
until the next time it has a breakdown. In equation 
form: 


r m 1, -| 

T„ + T s = T 6 - T a = . T a (6) 

L r m (z) J 

A correlation which will be made later is that T w + T g 
is sometimes referred to as average response time, T r . 
A useful alternate form to Equation 6 is : 


T w + T s - [" z]t. (7) 

L r w (z) J 


Since T a = z T B , and substitute for T a in Equation 6. 
The mean number of down machines in the waiting 
line, not including the one in service, is given by: 


L. = Z (k — 1) P* = m — (z + 1) r m (z) (8) 

1 

The mean number of all “down” machines, including 
the one in service, is given by: 


T* - T„ + T s + T 0 


Where: 


L q = 2 k P* = L w + Y m (z) = m - z r m (z) (9) 

fc - 1 


T& — is the mean time between breakdowns per 
machine 

T w — is the mean time waiting to be serviced per 
machine 

T s — is the mean time to service a “down” 
machine 

T a — is the mean time a machine is “up”, (oper- 
ating) 


Where, in Equations 8 and 9 above : 

P* is the probability that k machines are down 
k is the number of machines “down” 

m is the total number of machines in the systems 
and is a constant 

z is the ratio of the machine “up” time to the 
machine service time. 

r m (z) is the server utilization 
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The proportion of time that a machine spends in the 
“down” state is found by dividing Equation 6 by T 6 : 

Prob (machine K and only 
machine k is “down”) 

= (1 — z/mr w (z)) (10) 


Example 

Suppose that eight machines are tended by one 
serviceman. The mean operating time of a machine is 
380 seconds, and mean service time is 34.5 seconds. 
Assume that both operating and service times are ex- 
ponentially distributed. Determine the operating 
characteristics of this system. 


The service ratio is z = 


Ta 

T 3 


380 seconds 
34.5 seconds 


11 


a. What is the serviceman’s utilization? 

Using Figure 1 ,with z = 11 and m = 8 

r m (z) = r 8 (11) = .62 = 62%, which Is the 
serviceman’s utilization 

b. What is the average number of “down” ma- 
chines? 

Using Equation 9, with m = 8, r 8 (11) = .62, 
and z = 11 L ff = 8 — 11 (.62) = 8 - 6.82 = 
1.18, which is the average number of machines 
“down” and are located in the waiting line or in 
service. 

c. What is the average time a machine spends in 
the “down” state? 

Using Equation 6, with m = 8, r 8 (ll) = .62, 
T a = 380 seconds, T, = 34.5 seconds. 


(T. + T.) 


mT* 

*( 11 ) 


8 (34.5) sec. 
^62 


— 380 sec 


( T w + T s ) = 445 sec. — 380 sec, = 65 seconds, 
which is the average time a machine is “down.” 
d. What fraction of the total time is a machine in 
the “down” state? 

Using Equation 10, with m = 8, r 8 (11) = .62, 
z = 11. Prob = Fraction of total time = (1 — 
11/8 (.62)) = .15 = 15% 


Let us consider the same example again, the one we 
have just used to determine operating characteristics. 
To illustrate the practicality of the case of “Machine 
Servicing With a Single Serviceman” let us transform 
the example by considering the analogies we wish to 
introduce. 


ITEMS FOUND IN 
ORIGINAL EXAMPLE 

ANALOGOUS ITEM 
NOW REPLACING 
THE ORIGINAL 

8 machines, 
m = 8 

8 independent sources for 
incoming data; m = 8 

one serviceman, 
n = 1 

one service facility, 
n = 1, required to 
service all eight sources 
of data (the video scan 
converter) 

380 seconds = the mean 
operating time, T a , per 
machine (time frame is 
immaterial) 

380 milliseconds = the 
average inter- arrival 
time, T 0 , between 
characters coming from 
any one source of data 

34.5 seconds = the mean 
service time, T 3 , per 
machine (time frame is 
immaterial) 

34.5 milliseconds = the 
average service time per 
character of input from 
any source of data, T s 

serviceman’s utilization 
r m (z) 

utilization of video scan 
converter (servicer) , 

Ym (z) 

down machines, L q 

total characters waiting 
or being serviced in the 
system, 

the average time a 
machine spends in the 
“down” state - 
(T» + T.) 

the average time a 
character spends waiting 
for and receiving service 
= (Tw + T t ), response 
time, 

Tb is the average time 
between “breakdowns” 
per machine, and is the 
sum of T a , Tw and T* 

The average time interval 
between services of a 
specific queue; 1/T b is 
the average number of 
queues serviced during 
this time interval 

Prob (machine K is in the 
“down” state) 

= fraction of the total 
time a machine is in 
the “down” state 

Prob (that any character 
in the system is waiting 
or is being serviced) 

= fraction of the total 
time any character in 
the system is waiting 
or is being serviced 
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Transformed example continued 


rpn .... T a 380 ms 

The service ratio is z = — - = 7 r — — - = 11 

T s 34.5 ms 


T s 34.5 millisecond 

T / = — = (11) 

N N Characters 


a. What is the average scan converter utilization? 
Using Figure 1, with z = 11, m = 8 

r m (z) = r 8 (11) = .62 = 62% utilization 

b. What is the average number of characters in 
the system? 

Using Equation 9, with m = 8, r m (11) = .62, 
z = 11 L g = m — z r m (z) =8—11 (.62) = 
8 — 6.82 = 1.18 characters on the average are 
in the system waiting or being serviced. 

c. What is the average response time per character? 

4 

Using Equation 6, with m = 8, r 8 (ll) = .62 
T a = 380 ms, T* = 34.5 ms 

/ m T, v 

(T» + T.) = ( T 0 ) 

V r 8 (11) ' 

~ 8 (34.5 ms) 

380 ms 

.62 

(T« + T 8 ) = [445 ms - 380 ms] = 65 ms, 
average response per character 

d. What fraction of the total time does a character 
spend waiting for or being serviced? 

Using Equation 10, with m — 8, r 8 (11) = .62, 
z = 11 

Fraction of total time = (1 — z/m (r 8 (11)) 

= (1 - 11/8 (.62)) 

= .15 = 15% 

Extension and adaption of the general case u machine 
interference” queueing model to permit multiple char- 
acter updates per service cycle. 

Consider now that we wish to adapt the single charac- 
ter update model to one which is capable of representing 
a multiple character update. Specifically w r e mean the 
ability to service “N” characters coming from the 
same source and residing in the same queue in the same 
34.5 milliseconds service cycle. In effect, the service 
time per character reduces to: 


We are especially interested in the response time, 
T r , since this provides a measure of “machine respon- 
siveness’ J to a keyboard operator entering a character 
stream into the system. T r , is meant to be the average 
response time per character, since the response time for 
the first character will be longer than that of the last 
character awaiting service from the same source. 

As with our previous model, our service ratio is de- 
defined as : 


T a 


z = — 


T. 


(12) 


Also using Figure 1, the server utilization, r m (z), 
can be found at the intersection of z and m, and its 
value read from the “y” axis. 


Let T* = (T„ + T.) = 


m 


— z 


L r« (z) 


T s (13) 


During the time interval between services, T b , the 
number of characters which arrived at a specific queue 
is: 



" (T r * + T 0 ) " 


■ T,*+r 

T « 


_ T„ 


(14) 


Where N is the character contents of an individual 
queue and is the average number of characters services 
as a “batch”. 


Referring to Figure 3, in a typical multiple character 
service there are N characters and N — 1 time intervals, 
T a , between the characters. T a is the average inter- 
arrival time of the incoming character stream. As the 
wait time becomes longer more characters arrive at 
the individual queue, awaiting service simultaneoulsy 
with the first character in the queue. When the queue 
is serviced, all characters residing in the queue at that 
point in time are serviced in the same constant service 
time of 34.5 millisecond for all N of them. Note that 
the service time per character has been effectively re- 
duced to 34.5 ms/N characters. 

The response time per character must reflect, how- 
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THE SUM OF THE INDIVIDUAL WAITING TIMES- (FOR THE CASE, N«5) 
= [(T w ) + (T w -T a ) + (T w -2T Q ) + (T w -3T a ) + (T w -4T a )] 


n = (N-1) 

= (T w -nT a ) = WHER£, T w = (T*-T s ) 

n=0 - 


THE AVERAGE RESPONSE TIME PER 
CHARACTER 


n=(N - 1 ) 

T,. (T w -nT 0 ) 

n= Q 

N 


+ T. 


Figure 3 — Time profile of 5 characters, (N = 5), 
awaiting service in the 1th queue arid the summation 
of total time, T«,N = 5 


Since the service time was subtracted out in Equa- 
tion 15, it must be re-entered so that each and every 
character in the “bash” is charged with T s . Inserting 
T s into Equation 17, gives the average response time 
per character, T r : 


T r 


■n-(N-l) 

Z (T* - nT 0 ) 


»= 0 


+ T s 


(18) 


N 


Equation 18 is important since it is the mathematical 
expression we originally set out to find. The reader 
should note that T r is the Overall Average Response 
Time per Character. The following example should be 
of interest. 


Example 

Assume m = 32 independent sources of input 
character streams, each assigned to an individual 
queue. Each queue is serviced on a “first come- 
first served” priority, as determined by the arrival of 
the first character to enter the individual queue. Let 
T a = 313 milliseconds, T s = 34.5 milliseconds. Find 
T r , the Overall Average Response Time per Character. 

Using Equation 12, the service ratio, z, is: 

z = T a /T a = 313 ms/34.5 ms = £ 


ever, that all N characters did in fact require 34.5 ms 
each while they were serviced as a “batch” 

The concept used to obtain the average response time 
per character was to first separate the waiting time 
from the service time portion found in the first character 
response time: 

T„= (T r * — T.) (15) 


Using Figure 1, the server utilization, r 32 (9) — 1.0 
Where m = 32, z = 9 

Using Equation 13, the first character response time, 
Tr*, is: 


Tr* =■ 


32 

1.0 


9 


34.5 ms = 795 ms 


Where T w is the waiting time only of the first charac- 
ter, T s is the service time of the first character, and 
T r * is the response time for the first character. 

The sum of the individual character waiting time 
(See Figure 3) 


Where m = 32, z = 9, r 32 (9) - 1, 0, and T s = 34.5 ms 

Using Equation 14, the character contents of an in- 
dividual queue, N, is: 


T 

sum 


n-(N-l) 

Z (T. - nT a ) 


n=0 


( 16 ) 


N = 


795 ms 

b 1 

313 ms 


= 3.54 characters 


The average waiting time per character is : 


T w (average) = 


Z (T. - nT 0 ) 

71=0 


N = 3.54, is the average character content of an in- 
dividual queue. The characters are serviced once per 
cycle. 

Using Equation 15, the average wait time for the 
first character, T w , is : 

T w = (795 ms - 34.5 ms) = 760.5 ms 


N 
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Using Equation 16, the sum of the individual character 
waiting times is: 

T aum = [(TJ + (T* -T a ) + (T w 2T tt )] 

T aum = [(760.5 ms) + (760.5 ms - 313 ms) + 
(760.5 ms — 2 X 313 ms)] 

T SUm = [(760.5 + (447.5) + (134.5)] = 1,842.5 ms 

Using Equation 18, the overall average response time 
per character, T r , is : 


T, 


1,342.5 ms 


3.54 Char. 


+ 34.5 ms — 414.5 ms per 
character 


As compared to the value obtained with a simulation 
model, the following is the % difference: 

% Difference = (414.5 — 390) ms X 100% 

390 ms 

= + 6.3% Difference 

Presentation of results with comparison to a 
simulation model 



Table 1 and Figure 4 show the computed and simu- 
lated values from the queueing model described in 
an earlier section and a simulation (GPSS) model respec- 
tively. The purpose of simulating the video scan con- 
verter was to establish validity of the queueing model 
results, within a range of + or — 20%. 

The argument is valid that error in modeling can 
exist in: 

a. The queueing model 

b . The simulation model 

c. Both models 

Of paramount importance, however, is the under- 
lying principle that the probability is least that both 
models will be in error. As a general rule for confirming 
validity: 

a. Values from both models should be in the same 
“ballpark.” 

b. Output values from both should increase 
or decrease as independent variables are changed 
by like amounts. 

c. Produce approximately the same slope of values 

d. Provide a reasonable division of positive and 
negative % “differences” over the range of the 
model’s output. 


Figure 4 — Attached video displays with keyboard, m 


Using this as criteria to determine validity the fol- 
lowing is my evaluation of the results : 

a. The values differed by 10% maximum (at 
M = 16, the queueing model value for T r = 
162.0 millisecond versus T r = 180 millisecond 
for the simulation model. 

b. Over the complete range of m = 0 thru m = 
32, the values of T r from both models increased 
as m was increased a like amount. 

c. The slope of T r values from both models differed 
over the range of m = 0 thru m = 32. They 
were: 

Range Queueing Model Simulation Model 

m = 0 thru 8, Slope = + 4.77 Slope = + 5.06 

m — 8 thru 16, Slope = +11.16 Slope = +13.12 

M == 16 thru 32, Slope = +15.78 Slope = +13.12 


Totals +31.71 +31.30 

Even though the slopes are somewhat differer u 
they are not appreciably so. It appears the 
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TABLE I — Computation of values for Figure 4— 
Overall average response time per character, T r 


There appears to be a reasonable division be 
tween positive and negative % differences. 


m 

1 

4 

8 

if 

32 

EQUATION 

, r n, (z) 

.10 

.43 

.77 

.99 5 

1.0 

FIG. 1 

A 

T r 

34 . 5ms 

45.0ms 

83.0ms 

276 1 . 0ms 

795.0ms 

13 

N 

1.110 

1.146 

1.270 

1.89 3 

3.540 

14 

T w 

0 . 00ms 

10 . 5ms 

48.5ms 

241;. Sms 

760 . 5ms 

15 

E T w 

0 .00ms 

10 . Sms 

48 . 5ms 

2 41:. 5ms 

1,342.5ms 

16 

£V N 

0.00ms 

9 . 15ms 

38 . 2ms 

127j.5ms 

380.0ms 

17 

V£V N+T s 

34 . 5ms 

43.65ms 

72.7ms 

16 2i. 0ms 

414. Sms 

18 

T r (SIM.) 

34.5ms 

46.00ms 

75.0ms 

180;. 0ms 

390.0ms 

SIM. 

MODEL 

DIFFERENCE 

0 .00ms 

-2.35ms 

-2 . 3ms 

-18. 0ms 

+24 . 5ms 


% DIFFERENCE 

i 

0.00% 

-5.1% 

-3.07% 

-10.0% 

+ 6.3% 

19 


z - T a = 278.0ms 

T 34.5ms 

s 

z = T a = 313.0ms 

T g 34.5ms 


8 for m=l, 4, 8, 16 


9 for m=» 32 


queueing model does not take into account some 
factor at the low end of m and somewhat over 
compensates at the high end of m. 
d. As shown in Table 1, the maximum negative 
difference is — 10.0%, the maximum positive 
difference is + 6.3%. This is calculated as follows 


% Difference = 
(queueing value, 


T r — simulated, < 19) T r ) X 100% 
(simulated, T r ) 


CONCLUSIONS 

The queueing model as presented, in my opinion, 
provides a very satisfactory mathematical representa- 
tion of a video scan converter and better than originally 
anticipated. 
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Character generation from resistive 
storage of time derivatives 

by MICHAEL L. DERTOUZOS 

Massachusetts Institute of Technology 
Cambridge, Massachusetts 


INTRODUCTION 

Recent advances in man-machine communication 
have stimulated increased interest in techniques and 
special circuits that generate characters, for graphical 
and alphanumeric Cathode-Ray-Tube (CRT) display 
terminals, at the display site. The primary advantage 
in employing such local character generation is com- 
pression of the data that is required to store and com- 
municate a character from the computer to the dis- 
play — a single binary word of length n is all that is 
required to instruct the character generator to dis- 
play one of 2 n possible characters. The primary dis- 
advantage of local character generation is display 
cost, for it is generally considerably less expensive 
to generate characters from a longer sequence of more 
elementary commands — for example commands that 
cause the CRT beam to move right, left, up or down 
by a minimum resolvable increment. Besides these 
conflicting costs of data storage and transmission 
versus local-display generation, several other less 
tangible criteria such as character stability and fi- 
delity (aesthetics), are instrumental in the design and 
evaluation of a local character-generation approach. 

This paper discusses a character-generation tech- 
nique which requires, for each character, the storage 
in a resistive memory of the time derivative functions 
for the horizontal and vertical CRT deflection signals. 
The first section of the paper describes specific geometri- 
cal primitive segments that can compose a large class 
of characters and symbols; the choice of such primi- 
tives is important, since it affects directly the quality 
of the displayed characters and the display cost. Also 


given in this section is a complete list of primitive 
sequences for the 94-character ASC-II set. The second 
section of the paper describes a character-generation 
system that stores the above primitives in a resistor 
matrix, and uses them to compose desired characters 
on a CRT display. In the third section, this approach 
is evaluated and compared to more conventional 
methods of dot intensification, in terms of cost, speed, 
and fidelity. 

Character primitives 

Characters and symbols, generated on CRT dis- 
plays, are made up of certain elementary graphical 
segments. Character primitives over a character set 
will be called those segments which are (i) atomic or 
indivisible to smaller segments, and (ii) sufficient in 
number and quality to compose within acceptable 
accuracy every character in that set. At one extreme, 
the points of a uniformly spaced grid are adequate 
character primitives (Figure la); however, as the 
number of these points is reduced (Figures lb and c), 
it becomes progressively more difficult to recognize 
the displayed characters. At the other extreme, the 
set of all characters may be considered itself as a set 
of character primitives. This set, however, is not very 
useful, for while it is generally easy to construct a 
system capable of implementing the primitives of 
Figure 1, it is considerably more difficult and expen- 
sive to implement the primitives at the other extreme. 
Conversely, it takes only seven bits to specify one 
of the 94 characters of the ASC-II set, while it takes 
49 bits to specify every one of the possible subset of 
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169 POINTS 49 POINTS 16 POINTS 

• ••••'«* # # • 




• •••••* •••••** • 0 • 


• • • • • 

(a) (b) (0 


Figure 1 — Points as character primitives 


dots of Figure lb. These simple observations on the 
above two extremes are characteristic of the problems 
of character generation and of the objectives in the 
design of an effective character generator — that is 
the, desirability for a small number of primitives which 
can be economically implemented. 

The primitives used in the character generation 
technique of this paper are continuous strokes which 
are either (i) straight lines or (ii) so-called “cusps”. 
A straight-line primitive is specified relative to a 
point P by increments A z , A y which are real numbers; 
in our notation each such primitive! is denoted, when 
visible, by (A*, A y ) or, when invisible by an under- 
score (A g , A y ) . Figure 2a shows two such primitives. 
The equation of primitive (A*, A y ) is relative to a 
coordinate center at point P as follows : 




for 0 < J- < 1, 0 < < 1 

A x A y — Ag "" 


( 1 ) 


where x and y are the horizontal and vertical coordinates 
of every point on that primitive. 



Figure 2 — Straight/cusp primitives 


The cusp primitive, on the other hand, is specified 
relative to a point P by increments Ax, Aj,, which are 
real; moreover, one of these increments is overscored, 
and is called Jbhe cusp increments ; that is either (Ax, 
A y ) or (Ax, A y ) are valid cusp primitive notations. 
Geometrically, a cusp primitive is, as shown in Figure 
2b, contained in a rectangle of dimensions Ax, A y ; 
the curved segment corresponding to the overscored 
increment is obtained by dividing the other increment 
into three equal parts, fitting a straight line in the 
middle section and a parabola in each of the other two 
sections so that the parabolas are tangent to the above 
straight line. More precisely, the cusp, (Ax, A y ), shown 
normalized in Figure 2c, is given, relative to a coor- 
dinate center at point P, by 


In Region I (0 < ^ < |); 


_y 

Av 


X 2 

1 - (1 - 3f) 

Ax 


(a) 


In Region II (I < ^ < |); 

-T = 1 (b) (2) 

Av 

In Region III (| < — < 1) ; 

o Ax 


1 = l - (3 — - 2) 2 
Av Ax 


(c) 


The cusp (A x , A y ) is obtained from Equations (2) by 
interchanging literal x with literal y everywhere in 
these equations. A cusp is always visible. These ap- 
parently mysterious primitives are justified on two 
counts: (i) ability to represent a large class of charac- 
ters and symbols with a small number of primitives, 
as discussed immediately below and (ii) ease of im- 
plementation, as discussed in the following section. 

A character or symbol is composed from a sequence 
of these two types of primitives; here the first primitive 
is specified relative to the lower left corner of the 
character field, and each subsequent primitive is 
specified relative to the terminating point of the 
preceding primitive. For example, capital letter A 
is formed in Figure 3a by the primitive sequence 

& A = (.45, 1.2) (.45, — 1.2) (- -788, .3) (.676 t 0) 


Observe that the first segment is a visible straight 
primitive which starts at the lower left corner and 
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Figure 3 — Character composition by straight/cusp 
primitives 


terminates at the point [.45, 1.2]. The second segment 
is again a visible straight primitive, which starts as 
point [.45, 1.2] and terminates .45 units to the right 
and 1.2 units below that point. Observe further that 
the third segment is invisible, and that the direction 
and order in the sequence of each primitive is shown 
adjacent to each segment in Figure 3a. Capital letter 
P of Figure 3b is formed by the primitive sequence 

Sp = (0, 1.2) (.4, 0) .4, - .5 ) (.4, 0) (.2, 5) 


Here, the first four primitives are straight with the 
third primitive invisible. The fifth primitive however 
is a cusp which starts at the point [.4, .7] and ends at 
the point [.4, 1.2]. 

Figure 4 shows the primitive sequences corresponding 
to all 94 alphanumeric characters and symbols of the 
ASC-II code. This Table is arranged exactly as the 
table of the ASC-II code for reference purposes. Some 
statistics of interest here are as follows: 

1. The average number of primitive segments per 
character is 4.43. 

2. The maximum number of primitives per charac- 
ter is eight. 

3. The total number of different magnitudes for 
the primitive increments is 13. 

4 . No character uses more than two cusp primi- 
tives; these primitives occur (intentionally) either 
at the fifth, at the seventh, or at both the fifth 
and seventh segments of that character’s primitive 
sequence.* 

Of the above observations, 1, 2, and 3 indicate that 
a relatively small number of primitives can form a 
relatively large class of symbols. The fourth as wel) 


*or they can be made to occur at these segment positions by 
introducing primitives (0, 0) anywhere in the sequence. 



0 


p 


P 


(6,9X-6,3K0.-.5)(6,-.4X0,.5) 

(5,.3K3..6)(0,-6X-3..6)(-.3.-.6)(.l.-.5)(-.g,1.2)(.2.-.5) 

(0.l.2X.4.0)(-.4.-.5)(.4.0)(.S..5) 

(.5.10X-2-2) 

(0, -.3X042X0, -.6)(J,-.3)(0..3)(-.6,. 3) 

i 

1 

A 

Q 

0 

Q 

(.4,L2)(O-i0)(0,-4KQ.-4) 

(.2 .10X2,2X0, -1.2) 

(.45,1. 2 X. 45,-12) (-.788. .3) ( .676,0) 

(,8,0)(-.2..2)(.2,.7)(-.8..3)(0.-.5K.8.-.4)(0,5) 

(.8,0)(0,9)tO.-6)(-.9.-.3)(0,3)(.6..3) 

(.6,-.3XO,12)(0-6X-.e,-.3X03)(.8..3> 



B 

R 

b 

r 

13,1.0X0, 2)t.2,-2)(0,.2) 

(0..9)(.7, 3)( -7. -.9)1 .6.0) 

(04.2X.4.0X-.4.*.SH.4,0H.3, 5K0.-.3X.4,-.7)(-.4,0) 

(0.1 .2M.4.0K-.4.-.5K.4.0K 3..5)(0.-.5)( .4- 7) 

(0.1 2)(0.-9)(6,3)(0.-.3X-6.-.3) 

(4,0)(0.9)(0.-2X.7,.2) 

♦ 

3 

C 

s 

c 

S 

(B.0X2 J,2Ha.0K-.2.-12K.2S..4)(-7, 0H1.. 4HT.0) 

(.U2X4.0X.5.-.9X6.-7K-.2.0) 

(.8..9)(-.8,.3)(0,-.5)(.8.-.4) 

(■8..9)(-e,3)(.8.-5)(-.8.-.4) 

(.6, .3K-.8, -.3X0,3) ( 6, .3) 

(0,.3X6,-.3X-.6..3H.6.3) 

♦ 

4 

D 

T 

d 

t 

(4,I.4)(0,-L6)(.4.I.1K-6,.3K.8..3)(-.B.-.4) 

(.7.0 1 (0,1 2K-.7. -9X6.0) 

(0.l.2)(.4.0)(.4 1 -l 2X-.4.0) 

(. 4, OHO.l.ZH-. 4.0)1.8.01 

( 6.0) (0,1 .2) (0, -.9) ( -6. -.1) (0. .3K.6. .3) 

(6,0)(-l, OH-44 )(0.11X-2,-3)(. 4.0) 

% 

5 

E 

U 

e 

U 

U.OH.«.t. 2H- 6,0X0, -2H.t, -.0(0. 2) 

(•..7K.1.5X.6.0) 
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Figure 4 — Straight/cusp primitive sequences for 94-character ASC-II set 
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as the other observations above will be used in the 
following section in connection with the implementa- 
tion of this character generation technique. 


The character generator 

A local character generator for a CRT display is 
generally a system (Figure 5) with input a seven-bit 
word, denoting a character, and output two deflection 
and one beam-intensification waveforms (functions 
of time), which when applied to the CRT deflection 
and beam controls, respectively, display that character 
relative to beam position, x p and! y p . Character and 
line spacing is usually accomplished by a control unit 
external to the generator, which varies x p and y p upon 
completion of each character and line, respectively. 
If the CRT display module is of the refresh type, then 
the codes of characters to be displayed are stored in 
a local storage medium, usually a delay line, and are 
presented periodically, usually every 1/30 to 1/40 
sec to the character generator. If the CRT display 
module is of the storage type, then the character 
generator generates the waveforms x, y and b only 
once for each character to be displayed, and the cor- 
responding character is stored on the screen of the CRT. 

Any given character primitive! y = f(x) can be 
generated by such a system in an infinite number of 
ways, since for every one of many possible choices 
for a horizontal deflection waveform x(t), where t 
is time, there is always a vertical deflection waveform 
y(t) = f(x(t)) which when applied simultaneously with 
x(t), causes the CRT beam to trace the primitive y = 
f(x). Two particular types of waveforms, s(t) and 
c(t) were chosen to implement the primitives of the 
preceding section; they are shown in Figure 6a, and 
their time derivatives in Figure 6b. 

A straight-line primitive about any point is generated 
by applying waveform s(t), appropriately scaled, to 
both the horizontal and vertical axes. Thus, setting 

x(t) = Ax s(t) + Xi (a) 

(3) 

y(t) = Ay s(t) + yi (b) 

where Ax and Ay are real numbers, results in a straight 
line primitive from [x x , yj to [xi + Ax, yi + Ay] given 

by 


* - xi ^ y - yi 
AX Ay 


( 4 ) 



Figure 5 — -Local character generator 


and shown in Figure 6c. This is the desired primitive 
of Equation (1). 

A cusp primitive about any point, is generated by 
applying waveform s(t) to one axis and waveform c(t) 
to the other, after each waveform has been appro- 
priately scaled. Figure 6d shows the resulting segment 
when s(t) is applied to the horizontal axis, and c(t) 
to the vertical axis, and Figure 6e shows a segment 
obtained with different scaling and interchange of the 
two waveforms. More generally, setting 

x(t) = Axs(t) + xi (a) 

(5) 

y(t) = Ayc(t) + yi (b) 


where Ax and Ay are real numbers, yields a cusp 
primitive, about-point [xi, yj described as follows: 

for 



I - x ~^r y = yi + Ay (b) (6) 

for 


2 c x Z Xl < 1 

3 “ AX — 


y = yi + Ay 



( 0 ) 


Equation (6) is identical in form to the desired cusp 
primitive, given by Equation (2). Since Equations 
(4) and (6) implement exactly all the primitives of 
the previous section, about any point (xi, yi), it re- 
mains only to provide means for forming a string of 
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y 



Figure 6— Waveforms for straight/cusp primitives 


primitives, so that all the characters of Figure 4 may 
be implemented. 

The formation of strings of primitives, that is of 
characters and symbols, is accomplished by concat- 


enating the derivative waveforms of Figure 6b, for 
each primitive segment, after they have been scaled 
by Ax and Ay. Such waveforms, denoted by (l/T)dx/dt 
and (l/T)dy/dt, (T constant) are shown for letter P 
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Figure 7 — Composition of CRT deflection and beam 
waveforms 


on the top half of Figure 7; subsequent integration 
in time of these waveforms yieldsl the deflection wave- 
forms x(t) and y(t), shown on the lower half of Figure 
7. Also shown in Figure 7 is the beam waveform b(t) 
which turns the beam off in the third time segment 
2T < t < 3T. The character resulting from simul- 
taneous application of these x(t) and y(t) waveforms 
on the CRT is the letter P of Figure 3b, specified by 
the primitive string : 

Sp = (0, 1.2) (.4, 0) (— .4, —.5 ) (.4, 0) (.2, .5) 

Observe that these five primitives correspond to and 
are ordered as the five time segments of Figure 7. 

One way of implementing this character-generation 
approach is shown in Figure 8.; Here, sixteen lines 
carry eight rectangular constaiit-amplitude voltage 



Figure 8 — Character generator implementation 


pulses P 4 *, and their negatives and four lines carry 
two cusp-derivative pulses C t and their negatives. 
Waveforms and relative timing of these pulses are 
shown on the top center of Figure 8. Operation of 
the system is as follows: a character to be displayed 
is specified to the decoder shown on the right side of 
Figure 8, by, a seven-bit binary word. This word is 
“decoded”, so that one of the 128 output lines of the 
decoder, say the line marked P, becomes energized. 
That line, turns “on” the three analog switching devices 
to which it is connected, and starts the timing sequences 
of the P* and C »• pulses. The dx/dt, dy/dt and b wave- 
forms for the selected character are formed by resistive 
mixing of the above pulses in three groups, respectively. 
For the case under discussion, letter P is “stored” in 
the values and manner of interconnection of eight 
resistors shown enclosed by dashed lines. Here, the 
top four resistors mix pulses P 2 , — P 3 and P 4 , all equally 
weighted by a conductance of .4 units; the fourth 
resistor in that group weighs waveform C 6 by .2. As 
a consequence of this mixing, the resulting current in 
the so-called xbus is the weighted sum of all these 
waveforms and is identical to the dx/dt waveform of 
Figure 7. The next group of three resistors having 
conductances 1.2, .5 and .5 respectively forms, in a 
similar manner a current in the ybus which is the dy/dt 
waveform of Figure 7. Finally, the complement of the 
beam waveform of Figure 7 is formed by the last group 
consisting of one resistor of unit conductance, as a 
current in the b bus. The dx/dt and dy/dt currents are 
subsequently amplified by low-input-impedance ampli- 
fiers A and integrated in time to yield the x^(t) and 
y A (t) waveforms of Figure 8. These are identical to 
the desired x(t) and y(t) waveforms of Figure 7. 
These waveforms are, in turn, summed with the con- 
stants x P , y P and the beam waveform is inverted 
resulting in a display of character P about point 
[x P , y P ], At the end of this sequence, the integrators 
are reset to zero output and the analog switching 
devices are turned off, thereby making the character 
generator ready for display of the next requested sym- 
bol. Also shown in Figure 8 is the resistor “memory” 
for character 1; the reader may verify that when this 
character is selected, the system does indeed generate 
the primitive sequence for that character, shown in 
Figure 4. Observe finally that the system of Figure 
8 has two rather than eight cusp lines, C,*, which are 
active at timing positions five and seven. The reason 
for this choice is one of economics, since as we discussed 
in connection with Figure 4 it has been established over 
a large class of characters and symbols that these 
pulses at such relative positions are quite adequate. 
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Figure 9— Implemented characters (Courtesy of Computek Inc., Cambridgs, Mass.) 


A photograph of characters and symbols generated 
by such a system is shown in Figure 9. 

An alternative realization of the above character- 
generation technique would be to store for each charac- 
ter k bits in a digital read-only memory. These bits 
would, in turn, control a common, over all characters, 
resistive mixing network, by varying in discrete steps 
the conductances of this network. Such an implemen- 
tation, however, requires approximately k = 90 bits 
of storage per character and is considerably less 
economical than the system of Figure 8. 

COMPARISONS AND CONCLUSIONS 

Ultimately, the merits and disadvantages of a character 
generator rest on economic and aesthetic criteria. The 
former are very strongly dependent on technology and 
are subject to rapid change, while the latter are, beyond 
a certain point, quite subjective. Nevertheless, certain 
conclusions can be drawn. 

First, the use of stroke primitives such as straight 
lines and cusps results in more economical character 


storage than the use of points; and the relative advan- 
tage of such storage increases, over a certain range, 
with finer resolution. Consider for example that every 
character is formed on a grid of n 2 points. A straight- 
forward point-intensification or incremental-stroke 
scheme on such a “dot-matrix” would require the 



Figure 10 — -Memory growth versus resolution 
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storage of n 2 bits per character, indicating the points 
“which must be intensified — the corresponding memory 
growth curve, giving the number of diode components 
per character, is shown in Figure 10 and is labeled 
read-only memory”. From Figure 4, however, we 
know that the average number of segments per 
character, over the 94 character ASC-II set, is 4.43 
for the approach of this paper. Each segment, in turn 
requires two resistors, for x and y. We also know from 
Figure 4 that there will be of the order of 1.4 resistors 
per character for the beam. Hence, the average number 
of resistors per character is constant or 

2(4.4) + 1.4 ^ 10 

In addition, each character requires three analog 
switching devices (FETS) and their driver, or the 
order of five components. Thus; the total number of 
components is 15 per character, remaining constant 
within the limit of analog resolution, or n <100, as 
shown by the graph labeled “resistive memory” in 
Figure 10. With present technology, it is more eco- 
nomical to construct the character generator out of 
discrete components; the resulting cost, is for an ac- 
ceptable resolution n 2 = 240, lpwer than that of a 
read-only memory of % the resolution. With forth- 
coming technology, the above ten resistors and five 
active components, should cost each about as much 
as a diode, hence an even better cost advantage can 
be expected. Observe however, as was indicated above, 
that resolution cannot exceed that of analog circuitry, 
since the storage and generation of characters is analog 
in nature. On the other hand, the CRT is an analog 
device, on which resolutions higher than analog can- 
not be effectively used. The above savings in character 
storage, result in lower generator cost, and reduced 
generator size. 

Second, the speed of character generation of such 
a stroke technique is of the same order as that of dot- 
intensified character generation, since the current 
through resistors will generally change over its mini- 
mum resolvable increment as rapidly as, or faster, 
than the full current swing through a diode. 

Third, the mixing of time-derivative waveforms, and 
the subsequent integration of these waveforms pro- 
vides good character appearance through suppression 
of spurious noise and continuity of the integrated 
waveforms. 

Finally, the fidelity of continuous-stroke characters 
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Figure 11 — Comparison of straight/eusp and point 
intensified characters of iame resolution 


with the above primitives is considerably higher than 
that of dot-intensified, or incremental-vector charac- 
ters of comparable resolution. Such a comparison can 
be made visually by the reader on Figure 11 for a 
resolution of n = 4, or a grid of 16 points. 

The approach discussed in this paper can be further 
extended to a more complete hardware “grammatical” 
structure, through a straightforward extension. That 
is, characters can be constructed from primitives and 
other simpler constructs which are themselves com- 
posed of primitives and/or other constructs of the 
same class. For example, as seen from Figure 4, the 
primitive sequence, S 8 , for numeral 8 contains the 
primitive sequence S s for capital S. That is, S 8 = S s 
(.8, .5) It is not yet clear whether such a hardware 
structure will result in even lower cost, without sacri- 
fice of performance. 

Finally, we would like to close with the philosophical 
observation that the use of sizable straight-line and 
cusp primitives is well suited to character generation, 
since characters and symbols were generated, on the 
first place, through such strokes, by pen or stick, on 
paper or sand, rather than by dots or by infinitesimal 
straight-line segments, 
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INTRODUCTION 

Electronic computers find wide applications in the 
fields su s eh as document production, compilation of 
printed articles, language translation. The need for 
high speed printers and display systems for various 
types of characters and symbols is increasing. There 
is a high demand for high speed printers and display 
systems for “Kanji (Chinese characters)” in Japan and 
other Oriental countries. A character generator with 
a font capacity greater than 1000 is required. Nearly 
the same number of character fonts might be needed 
also in Western countries, if special fonts of Greek 
or Roman alphabets, italics, bold face, or special 
mathematical symbols are included. 

A pattern generator is of prime importance because 
it carries out the translation from binary coded in- 
formation into characteristic and symbolic informa- 
tion. The pattern generator is important as it is an 
initial link. It is contained in a printer and acts to con- 
nect the computer and printer. It is of paramount im- 
portance that the pattern generator has high speed 
and high resolution. 

High speed pattern generation systems with the 
font capacity greater than 1000 have been reported. 
However, they were designed for typesetting and 
consequently too expensive. 1 * 2 In a typical system a 
flying spot scanner and a character grid are utilized. 
The character grid is a pattern carrying film or glass 
plate on which a number of character patterns are 
printed in a negative form. The flying spot scanner 
must exhibit a very high resolution in order to dis- 
criminate more than 1000 characters and symbols 


stored in the character grid. The electronic circuits 
must be of high quality in order to achieve precise 
control of scanning. This makes high speed pattern 
generation systems expensive and bulky. In this paper 
we would like to present a new pattern generating 
system which is low in cost and compact in size. 

System concept 

The attempt to decrease the cost and size of pattern 
generators was carried out by replacing the expensive 
flying spot scanner and associated circuits with other 
means. One such means would be a vidicon, a photo- 
electric conversion device, in which a light-irradiated 
photoconductive element is scanned by an electron 
beam producing a video signal. However, the vidicon 
has no resolution which can be attained with a flying 
spot scanner. It cannot discriminate more than 1,000 
characters and symbols when they are projected 
simultaneously on the face of the vidicon. 

It is apparent that the resolution and discrimination 
characteristics can be improved if a small number of 
larger patterns are projected on the face of the vidicon. 
The use of larger patterns keeping the total number 
of characters constant, is achieved by introducing 
character grids. Each grid shares the total number of 
characters and symbols, and the grid is selected and 
projected on the face of the vidicon by a flash tube 
selectively energized. 

Vidicons retain residual image. This property is un- 
favorable for the time-shared use of several character 
grids. Thus the residual image formed on the face of 
the vidicon by a prior projection should be erased 
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Figure 1 — Schematic diagram of the character generator 


before the succeeding projection of a different charac- 
ter grid is made. Accordingly, the erasing scan of the 
vidicon face should precede each reading scan. How- 
ever, if an erasing scan is applied to the entire face of 
the vidicon, a certain time would be wasted prior to 
each reading scan. In our pattern generating system the 
erasing scan or prescanning is restricted to the area 
where the reading scan is to foIloW. The remaining area 
is occupied with residual image. 

Character generating unit 

Figure 1 shows a schematic diagram of the character 
generator designed for Kanji. Characters and symbols 
are printed in a 16 by 16 matrix form on each of four 
character grids. Four miniature! flash lamps whose 
light emission timing is determined by a control cir- 
cuit are used to project the real image of the four 
character grids. When one of the; flash lamps is selec- 
tively energized, all the patterns! printed on the cor- 
responding character grid are projected on the full 
effective area of a target face of the vidicon (type 
8572) by means of a half mirror and a lens having re- 
duction ratio 1/2. F-number of the lens is 5.6. 

Generating cycle 

The vidicon consists of a highly evacuated envelope 
containing an electron gun at one &nd and a transparent 
optical flat target face at the other (Figure 2). A trans- 
parent conductive layer is deposited on the inner sur- 
face of the target face as a signal plate. A photocon- 
ductive film is deposited on this layer so as to form 
capacitors. In the site of electron impact the surface 
of photoconductive layer catches a negative charge of 
electron. When no light falls on the photoconductive 
layer, its surface is maintained at the cathode (ground) 
potential by electron beam scannihg because the layer 
is a good insulator. When a pattern is projected con- 
duction increases in the bright areas. The bright part 
of the pattern enhances the leakage current through 


deflection yoke 



Figure 2 — Schematic cross-section of the vidicon 

the layer and let the capacitors discharge during ex- 
posure. The reading scan which follows the exposure 
restores the negative charge, and the current for the 
restoration produces a video signal across the target 
resistor. 

Generation of a character is accomplished by the 
following sequential steps : 

1. Prescanning the area where the desired char- 
acter is to be projected. 

2. Flashing a xenon lamp in order to project and 
store the character image on the vidicon target. 

3. Scanning the area of a particular character in 
order to pick up the video signal. 

In the prescanning step, the deflection yoke moves 
the electron beam to the position on the vidicon tar- 
get where the character is to be projected, and lets 
the beam form a small raster throughout the area to 
cover the image of character. The raster size is 0.7 mm 
square (about 1/250 of full effective area of the vidicon 
target). It takes 1.5 ms to erase completely the resid- 
ual image stored by preceding flashes. 

Two factors are specified for the image persistence, 
viz., the transient response of photoconductive ma- 
terial, and the time lag which results from incomplete 
charging of electrons on the target with large capaci- 
tance by the scanning beam of low landing efficiency. 8,4 
Generally the photoconductive decay time constant 
is very short, of the order of one ms. On the other hand, 
the capacitive lag makes a predominant contribution 
to the image persistence (of 10 ms) in the standard 
TV application. Since the total target capacitance is 
proportional to the size of the raster, there is no sig- 
nificant capacitance in the present application where 
about 1/250 of the total surface is used. The localized 
scanning reduces the resultant image persistence time 
from 10 ms of TV application to about one millisecond. 
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The flashing illumination just after the preseanning 
continues only 5jus. Each miniature xenon-flash lamp 
is energized by the discharge of a capacitor, which is 
triggered by a selection pulse. Although more than two 
hundred characters are projected on the target face 
of a vidicon, only one of them is exactly stored on the 
vidicon target, because the corresponding part of the 
vidicon target has been presecanned. 

The last step is the reading scan. The deflection of the 
scanning beam during this step is the same as that of 
the prescanning. However, the videosignal on the out- 
put of the vidicon, is taken out through a video gate 
circuit. 

Figure 3 illustrates a portion of a real image. It is 
projected from a character grid onto the target face 
of the vidicon. Owing to the image persistence of the 
vidicon target, the image focused persists for a certain 
period even if the projection is executed for a very 
short time. The prescanning and read scanning of 
particular area are accomplished by the X and Y de- 
flection yoke. The prescanning and read scanning modes 
are illustrated in Figure 3. Examples of prescanning 
and read scanning are shown by the lines superimposed 
on the letter 3. 

The deflection of the scanning beam to any position 
on the vidicon target can be accomplished in 5^s. 
Linearity and stability of the deflection amplifier are 
approximately 0.1 percent. The bandwidth of deflection 
amplifier is 5Q0 KHz. 


Y 



Figure 3 — Figure of the image of a character grid 
projected on the face of the vidicon 


Perpendicularity and residual magnetism of the de- 
flection yoke, and pin cushion or barrel distortion of 
the vidicon are the other factors influencing the posi- 
tioning accuracy of the electron-beam deflection. The 
pin cushion distortion of the vidicon diminishes the 
accuracy considerably. Four small magnets each of 
the size 2X2X3 mm placed close to the vidicon 
target correct the distortion. Residual magnetism of 
each magnet is about 2000 G along the longitudinal 
axis. Positions of magnets are adjusted by means of 
screws. In the present system the overall error of beam 
positioning is kept within 0.5 percent of full deflection. 
This is sufficient because the projected characters are 
larger than those in the flying spot system. 

Processing of video signal 

The video signal output of the vidicon is amplified 
and converted to a two-level signal by a video-proc- 
essing circuit. 

As the aperture of the scanning electron beam is not 
very sharp, the video signal contains an intermediate 
level notwithstanding the fact that the character 
grid has two levels of black and white. Considerable 
variations in both modulation depth and dc level oc- 
cur in the video signal depending upon detailed patterns 
of projected characters. Shading of the vidicon also 
causes variations. Thus a simple clipping circuit of con- 
stant clipping level cannot be used. 

Figure 4 shows a block diagram of the video proc- 
essing circuit. The increment of the video signal is 
detected in the differentiation circuit which consists 
of a 0.5 (jls delayline and an integrated differential 
amplifier /*PC53. This circuit eliminates the dc-level 
shift from the video signal and sends trigger pulses to 
a flip-flop which converts the video signal to a two- 
level signal. 

System operation 

The control circuit in Figure 1 decodes a pattern- 
representing binary signal, selectively energizes one 
of the flash lamps, and controls the prescanning and 
read scanning in the vidicon so that the desired one 
of the projected patterns is scanned. 



Figure 4 — -Block diagram of the video processing circuit 
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Two significant bits of the character-representing 
signal are decoded into four flash lamps to select one 
character grid out of four character grids. The re- 
maining eight bits are supplied to the X and Y direc- 
tion D-A converters in the deflection circuits. 

There are two saw-tooth waveform generators in 
the control circuit, one for X-scaiming and the other 
for Y-scanning. The repetition frequency of Y-scan- 
ning is 20 KHz while that of X-sdanning is 0.67 KHz. 
The ratio of these frequencies is; determined by the 
number of scanning lines for one character. In the 
present system, each character is scanned by 30 verti- 
cal lines. The scanning signals from each saw-tooth 
wave generators are respectively added to the character 
selection signals of X- and Y-axis which are supplied 
from the D-A converters. Figure 5 shows a block dia- 
gram of the deflection circuit. 

The control circuit produces the gate pulse for the 
video signal as soon as the read-scanning starts. Syn- 
chronizing pulses for X- and Y-axes are available from 
the control circuit for the reconstruction of the char- 
acter images either at display or at printer unit. 

Operating characteristics. 

The optical unit of character generator is shown in 
Figure 6. The size of this unit is 500 mm wide, 600 mm 
long and 150 mm high. The weight is 20 kilogram. Al- 
most all the electronic circuits are! constructed by IC's. 

The quality of the characters generated by the 
present 1024 font capacity system is sufficiently high. 
Figure 7 shows an example of Japanese sentences dis- 
played on a CRT display unit. The storage CRT which 
needs no costly memory devices for the refreshment 
of the information, is suitable for this application. 
Figure 8 shows an example of printed pages performed 
by a fiber optics CRT unit. 

The generating speed of the present system is 330 
characters per second. The machine speed is restricted 
by the persistent lag in the vidicon* In order to decrease 
the time for erasing a new photoconductive layer of 
the vidicon is required. 



Figure 5 — Block diagram of the deflection circuit 



Figure 6 — The optical unit of character generator 
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Figure 7 — Displayed Japanese sentences on CRT 1 


The reliability of the vidicon operated under unusual 
condition of selective scanning on the target was in- 
vestigated by the running test of about 1000 hours. 
But no noticeable change was observed. 

CONCLUSION 

It has been confirmed that the new opto-electronic 
character generating system with 1024 font capacity 
has many advantages such as high font capacity, high 
speed, high quality, low cost and small size. The ad- 
vantages have been achieved by utilizing four char- 
acter grids and one single vidicon. Each character grid 
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The generation speed of 330 characters per second 
was realized with the new character generating system. 
Excellent stability was confirmed for a long period of 
operation. 
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ISDS — A program that designs computer 
instruction sets 


by F. M. HANEY 

Scientific Data Systems 
El Segundo, California 


INTRODUCTION 

ISDS (Instruction Set Design System), a program 
that designs instruction languages for computers, is 
the result of research aimed at gaining a better under- 
standing of computer-assisted design and, in particular, 
automated design of computers. The primary goal of 
the research was to develop techniques for writing 
programs that solye design problems without inter- 
vention by human designers. This paper describes a 
program that solves a specific design problem — the 
selection of an order code for a computer — but the 
general approach can be easily adapted to other de- 
sign problems. 

ISDS contains a generalized model of a computer 
instruction set and solves a design problem by filling 
in details of the model, analyzing the result with respect 
to the requirements of the given problem, and selecting 
instances of the model that best meet the requirements 
of the problem. 

The model used by ISDS is GIS (Generalized In- 
struction Set) which is capable of representing a broad 
range of computer instruction sets, including most of 
the features of existing computers. 

The programs that make up ISDS operate at several 
levels, the lowest, of which is used to manipulate the 
tree structures storing GIS representations of in- 
struction sets. Other programs in ISDS perform compu- 
tations useful in analysis of instruction sets, select 
values for single values of the instruction set, analyze 
an entire instruction set, and determine the optimal 
method of selecting parts of the instruction set. 


This paper is organized into four sections : 

1. A discussion of design theory and the basis for 
the ISDS approach to constructing design pro- 
grams. 

2. A description of GIS, the model of an instruc- 
tion set that ISDS uses as its basic design con- 
cept in solving a problem. 

3. A description of the programs that make up 
ISDS and the actual operation of ISDS, in- 
cluding an example of an instruction set designed 
by ISDS. 

4. A summary of the results of experimentation 
with ISDS. 

Formalizing the design process 

Before programs that simulate design process can 
be considered, the complex nature of this process must 
be understood. Many models of the design process have 
been proposed, but for the most part they are the same 
in content if not in detail. However, two men have ade- 
quately expressed the complexity of design process — 
Asimow 1 and Alexander. 2 

Asimow considers design as a process of specifica- 
tion during which the solution to a design problem is 
gradually transformed from an abstraction into a 
physical reality. At each step the solution is analyzed. 
If any part of the solution fails to meet some require- 
ment of the design problem, or if other decisions lead 
to better solutions, some parts of the solution may have 
to be re-specified. Dealing primarily with engineering 
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design, Asimow identifies over 25 different steps in 
the design process, each dealing with a different level 
of detail. 

Alexander’s view of design is consistent with As- 
imow’s, although Alexander places greater emphasis on 
the relationships between design! variables which must 
be considered at each step. The value of any part of 
a solution depends on, and may help to determine, the 
value of other parts of the solution. Since design is 
generally a serial process, the designer must be aware 
of these interactions and be careful about the sequence 
in which he makes design decisions. A particular method 
for treating the relationships between design variables 
is called “design strategy.” 

For many design problems, the truly creative part 
of the process seems to take place in the very early 
stages when the “design concepjt” is formed. This is 
the most abstract form of the solution, except, of 
course, for the more abstract functional descriptions. 

Some design problems consist; of complex sub-prob- 
lems that require creative design, but for many, once 
the design concept is formed the solution is a relatively 
simple process of specifying details in such a way that 
the resulting solution meets the requirements of the 
problem. The fact that many computer instruction 
sets are so similar, suggests that the instruction set 
design problem is one for which most solutions can be 
generated by a single design concept. 

This observation is the basis of an approach to 
writing a program that designs ; instruction sets. The 
trick lies in providing the program with an appropriate 
design concept that is general enough to include a 
broad range of instruction sets,; but it must contain 
enough information to guarantee! that the program can 
transform the concept into a solution in a reasonable 
amount of time. 


01 S : A design concept for instruction sets 

Existing instructions for computers have many com- 
mon features. A typical instruction occupies one word 
of the computer memory and consists of several fields 
of information, each encoded in a particular set of 
bits. Most computers have a field containing a code for 
an operation the computer is to perform when it exe- 
cutes the instruction. An instruction may be comprised 
of one or more fields containing j addresses of locations 
in memory which embody information to be used during 
execution of the instruction. Some computers allow 
special methods, such as indexing and indirect ad- 
dressing, for specifying data in the main memory of 
the computer. The purpose of GIS is to organize as 


many of these features as possible into a single, general 
model of a computer instruction set. 

Since GIS is a model for a type of language, it can 
be described in the notation of Backus Normal Form. 
The complete description of GIS is rather detailed 
since it includes almost all of the features that have 
been used in instruction sets and a detailed description 
of the meaning of each syntactic feature of GIS. 

For illustration, part of a GIS representation of an 
instruction is : 

< simple instruction > : : = < operation > < left 
operand part> < right operand part> < result 
part> < condition part > < if part > < else part > 

The < operation > part of an instruction in GIS 
may be one of a list of 36 operations including: 

add, subtract, multiply, divide, compare, branch 
shift, move logical operations, and others. 

The < left operand part>, < right operand part>, 

< result part>, <if part>, and <else part> are 

< addresses >. 

An < address >, in turn, consists of many parts in- 
cluding displacement information, indexing, indirect 
addressing, bits to distinguish between references to 
various types of memory such as main memory or 
register memory, and other special techniques for 
specifying memory locations. 

Each part of an instruction has an interpretation. 
The right and left operand parts specify operands 
which are to participate in the operation. The < result 
part > specifies an address where the result of an opera- 
tion is to be stored. The < condition part> specifies 
some internal condition which may be set as the result 
of the operation. The <if part> specifies the address 
of the next instruction provided that the internal con- 
dition is satisfied and the <else part> specifies the 
address of the next instruction if the internal condi- 
tion is not satisfied. 

In most instruction sets, some of the GIS parts 
have implicit values. For example, in a single- address 
instruction format one of the operand addresses is al- 
ways assumed to refer to the accumulator. The same 
is true of the result address. The if and else instruction 
addresses are assumed to refer to the next instruction 
in memory. To completely specify an instruction set 
by means of GIS, it is necessary to indicate whether 
each instruction part is implicit or explicit. The as- 
sumed value must also be specified for implicit parts 
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while, for explicit parts, the parts of the instruction 
format used to encode, the value of the part must be 
precisely specified. 

GIS can be used to represent almost any instruction 
format in use in existing computers. From a syntactic 
point of view its primary limitation is its list of opera- 
tions, which is necessarily restrictive since some opera- 
tions in actual computers deal with special features 
and cannot be generalized. From a semantic point of 
view, GIS is not capable of all the subtle nuances as- 
signed to certain instructions in some computers. For 
example, GIS makes no distinction between post-in- 
dexing and pre-indexing. In most cases, however, 
these subtleties have little effect on the design of the 
syntax of the instruction language which is of primary 
concern. 

The most important attribute of GIS so far as the 
design program is concerned is that it is a design con- 
cept for instruction sets which it appears to represent 
at an appropriate level. 

GIS meets the requirement of generality because 
it contains all the important addressing methods as 
special cases. It can be used to represent single ad- 
dress instructions, double- or triple-address instruc- 
tions, memory-to-register instructions, and register- 
register operations, as well as others. 

Another requirement is that a program using GIS 
as its model of an instruction set should be able, with- 
out a great deal of effort, to generate instruction sets 
that are plausible solutions to a design problem. GIS 
possesses this feature in the sense that any instance of 
the GIS model is indeed a valid instruction set. 

ISDS : The design program 

The first step in the construction of ISDS was the 
selection of a method for storing GIS representations 
of instruction sets in the memory of a computer. The 
Backus Normal Form representation of GIS suggests 
a tree-like data structure. The structure actually used, 
called a “form- variable”, is an IPL-V (Information 
Processing Language-V) list structure containing 
each instruction part identified by name and an attri- 
bute-value description list for each part to store impor- 
tant information about the part (whether it is implicit, 
whether the specification is a list of possible values or 
the number of bits needed to encode the time, and 
other descriptive information.) 

All of the programs of ISDS are written in IPL-Y, 
the primary reason being that IPL-Y contains instruc- 
tions for manipulating the tree-like data structure that 
is most appropriate for representing GIS instruction 



Figure 1 — Hierarchy of routines and data in ISDS 


sets in the memory of a computer. However, the form- 
variable is a slightly more specialized data structure 
than the IPL-V list structure. Hence it was necessary 
to write a set of programs for manipulating form- 
variables. 

These form-variable routines add items to form- 
variables, delete items, search for items, find attribute 
values on item description lists, and insert and delete 
attribute values on item description lists. The form- 
variable is a recursive data structure since an item may 
be a single value, a list of values or another form- vari- 
able. 

The form variables of ISDS are at the lowest level of a 
hierarchy of routines (see Figure 1*) and are the building 
blocks of other routines in the sense that the higher- 
level routines make use of them to store new items in 
an instruction set, search for an item, and so on. 

The form-variable routines are general in that they 
contain no information about instruction sets, GIS, or 
any aspects of the design process but are merely book- 
keeping programs. ISDS contains another set of pro- 
grams that are general in the sense that they perform 
the numerous computational tasks that must be under- 
taken during the design of an instruction set. These 
tasks include counting the number of items on a list 
and determining the number of bits required to encode 
a list of items. 

At the level above the form-variable and computa- 
tional routines, ISDS contains routines that add sin- 
gle parts to an instruction set. Qne such routine, for 


* Figures 1 through 4 from thesis, ‘Using A Computer to De- 
sign Computer Instruction Sets”, by Dr. Fred M. Haney. Car- 
negie-Mellon University. 
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example* adds a specified number of bits for designating 
index registers in a memory address. The number of 
bits is an input to this routine. 

This routine performs no analysis, but merely the 
bookkeeping required to add a new part to an instruc- 
tion set. The analysis required to determine the num- 
ber of bits to be added for indexing is performed at the 
next level of the hierarchy. One routine, for example, 
adds indexing to the address references of an instruc- 
tion set. For this routine the number of bits is not 
specified. The routine performs the analysis to deter- 
mine the number of bits to be specified and then calls 
its counterpart which adds the specified number of 
bits. The routines which add specified parts to an 
instruction set are called “strategy-level utility rou- 
tines”. The routines which perform analysis and call 
for specific parts to be added are called “operators”. 

The routines in the higher level of ISDS are much 
more specialized than the low-level form-variable 
routines that can be used to represent many different 
kinds of objects. At the next higher level, the strategy- 
level utility routines are intended specifically for con- 
structing instruction sets although they could be used 
in any design strategy since they have no decision 
power. Some decision power begins at the level of the 
operators which are based on a particular view of the 
relationships between the different parts of an instruc- 
tion set. Each operator uses the values of certain parts 



Figure 2 — Relationships between the design variables 


of the instruction set to determine the value of some 
new part. The types of possible relationships are il- 
lustrated in Figure 2. 

In many cases, the relationship between parts of 
the instruction set are relatively obvious, but dif- 
ferent results could be obtained with a different set 
of operators. 

So far, nothing has been said about how the operators 
of ISDS are applied. One way is to write a program 
consisting of a sequence of calls on the operators. 
Operators that might be called, for example, are the 
address operator (which selects the number of address- 
es per instruction and the size of each address), the 
indexing operator, the indirect addressing operator, 
the arithmetic operator, and the logical instruction 
operator. (This program would be a specific design 
strategy for the instruction set design problem.) It 
must be recalled that a design strategy is a particular 
method for selecting the parts of a solution to a design 
problem. In particular, a design strategy is a specific 
choice of the independent variables that determine 
each part of the solution, together with a particular 
sequence in which the design decisions were made. As 
was pointed out, the operators represent a particular 
view of the independent variables and their influence 
on each part of the instruction set. The operators could 
have been used to write a set of different design strate- 
gies. Instead, however, a heuristic program that would 
determine its own strategy according to the demands 
of the design problem was written: 

The statement of the design problem to this program 
consists of the following information: 

1. An optional GIS representation of a particular 
instruction set containing features which must 
be included in the final product. 

2. A cost-value matrix which assigns a relative 
cost and value to each instruction feature of 
GIS. The cost-value matrix also specifies a 
maximum cost for the instruction set. 

3. Optional constraints on instruction features. 

4. Memory size, word size, and byte size of the 
computer. 

The heuristic design program consists of two rou- 
tines; a basic strategy and a search routine. The basic 
strategy uses the memory size and word size to deter- 
mine the number of addresses in each instruction and 
the general format of each address (whether it is a 
memory reference or an address augmented by a base 
register, page bit, etc.). 

After this basic strategy has provided a starting 
point, the search routine adds one instruction part at 
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Figure 3 — Optimization in ISDS 


a time until there is no remaining space in the instruc- 
tion format or the cost limit is reached. At each stage 
of the specification of the solution, the search routine 
tries every operator and evaluates the result with re- 
spect to the value coefficients provided in the state- 
ment of the problem (See Figure 3). 

Corresponding to each operator there is a routine 
that restores the instruction set to its status before the 
operator was applied. 

Hence, the sequence of events at each stage of speci- 
fications is “apply an operator”, “evaluate”, “restore”, 
“apply the next operator”, etc., until all operators have 
been applied, at which time the search routine reap- 
plies the operator that resulted in the greatest im- 
provement in the instruction set. 

The search described above is a one-step search in 
the sense that the instruction set is evaluated after 
application of a single operator. Presumably much 
more interesting strategies could be obtained by evalu- 
ating after the application of sequences of operators, 
but the geometric increase in the computing time 
required made this approach impractical. 

This example illustrates the operation of the heur- 
istic program described above: 


The following inputs were presented to the heuris- 
tic program: 


A cost value matrix as follows: 

Cost 

Value 

Indexing 

10 

10 

Indirect Addressing 

0 

20 

General Registers 

0 

10 

Partial Word Address 

0 

1 

Extra Operations 

0 

1 

Permanent Adjustment To 
Index Registers 

0 

10 


2. A cost constraint of 10. 

3. Required operations of add, subtract, multiply, 
divide, compare, and absolute value for fixed 
point and floating point arithmetic. 

4. Required operations of “negate”, “and”, “or”, 
and “no operation for logical data.” 

5. A move operation. 

6. Memory size and word size of 65536 words and 
36 bits respectively. 

The basic strategy determines that 16 bits are re 
quired for each main memory address. Since five bits 
are needed to encode the required operations, there 
is only room in an instruction word for one address 
without some augmented addressing scheme. The basic 
strategy can specify augmented addressing, but for 
this case it specifies a single, main memory address 
specification of 16 bits. The search strategy specifies 
additional instruction features in the following se- 
quence: general registers, indirect addressing, addi- 
tional operations, additional operations, indexing, a per- 
manent adjustment to an index register after indexing, 
operations, operations, partial word addressing. The 
resulting instruction set has the following format: 


0 56 9 10 13 14 15 18 19 20 35 


Opera- 

Partial 

[General 

Index 1 

Index 

Indirect 

(Memory 

tion 

Word 

Register 

Adjust 


Addressj 

(Address 

Code 

Addressl 



i 




This format is almost identical to the format of 
the Univac 1108 computer, however, the instruc- 
tion set designed by ISDS is not. The primary 
difference is in the number of operations in the 
two instruction sets. The 1108 permits over 150 
operations, whereas the ISDS instruction set con- 
tains only 52 operations. 
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The instruction sets also differ in their interpreta- 
tion of some of the instruction features. How- 
ever, this example shows that ISDS is capable 
of designing an instruction language that in its 
essential features resembles the instruction lan- 
guage of the Univac 1108. 

It is interesting to note in the above example that 
if only 16 or fewer operations are required in the state- 
ment of the problem, then the basic strategy assigns 
four bits for the operation code and the remaining 32 
bits permit two 16-bit memory references. In this case 
the search routine would not be able to apply any of 
the operators since every bit of the instruction word 
is used by the basic strategy. This illustrates a prac- 
tical value of the present heuristic program; i.e., it 
permits a designer to learn by experimentation how 
the different design variables interact and how minor 
changes in one part affect the final product. 


SUMMARY 


Working with ISDS indicates that for some design 
problems it is plausible to write programs that solve 
the design problem without human intervention. In 
general, the approach consists of the following steps: 


1 . 

2 . 

3. 

4. 


Select a design concept- -a model of solutions 
to the design problem. 

Select a data structure for instances of the de- 
design concept. 


Create operators that perform analysis and 
specify single parts of the model. 

Create programs that use cost, value and con- 
straint information from ; the statement of the 
problem to apply the operators in some sequence 
that results in a solution td the problem. 


This process, as it is applied in ISDS, is illustrated 
in Figure 4. 

To be of practical use, a design program based on 
the ISDS approach would require a more sophisti- 
cated search strategy than the one used in the present 
version of ISDS. In general, it is probably possible to 
find clever ways of selecting the operators to be ap- 
plied without actually trying every one. Any such 
scheme would give the search much more direction and 
enable the program to evaluate strategies of depth 
greater than one. 



Figure 4 — ISDS as a design model 


The approach to automated design described is of 
limited use in many practical design problems. How- 
ever, as designers experiment with interactive design 
systems they are likely to discover problems for which 
the so-called creative effort is relatively routine. For 
such problems, the approach of ISDS offers the pros- 
pect of more efficient automation than can be achieved 
in an interactive system. 
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Statement of the problem 

The system engineer operating within the frame- 
work of a typical manufacturing organization operates 
from the following basic information and constraints: 

a. A set of customer specifications to be met, 

b. A basic system configuration to be used in real- 
izing these specifications, 

c. A set of standard components that fit into this 
configuration. The problem is to determine the 
collection of components that satisfies the given 
specification at minimum total dollar cost . 

The above described situation exists in every area of 
system engineering where the configuration is “fixed” 
and a multitude of candidate components are available. 
The characteristics of these components can be stored 
in computer libraries by part numbers and an analysis 
program can be written to systematically analyze the 
system for any candidate set of components by merely 
inserting the appropriate part numbers. Such computer 
programs are structured so as to retrieve the data for 
each particular component, proceed with the various 
performance calculations and display the results to 
the designer for each set of part numbers manually 
selected. 

This paper goes one step further and presents tech- 
niques and procedures for the effective use of computers 
in automating the solution to the above class of design 
problem. 


Theoretical development 

Development of the analysis program 

The analysis section is the starting point of any com- 
puter-aided or automated design program. Optimiza- 
tion, in the design context, is derived from an efficient 
use of iterative analysis techniques. Devoid of a good 
analysis capability, the designer has nothing. Its pres- 
ence provides a powerful tool in itself. In this case, 
however, it is simply a means to an end — Automated 
Design. 

Although the internal details of the analysis program 
vary greatly for different applications, the input-out- 
put characteristics can be readily defined as shown in 
Figure 1. The first, and primary, requirement of the 
analysis program is that it must accurately represent 
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Figure 1 — Input-output characteristics of system 
analysis program 
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^he hardware. This requires a significantly detailed 
model, including often overlooked rionlinearities, and a 
realistic consideration of component tolerance effects. 
Second, the outputs of the analysis program must have 
a one-to-one correspondence with the list of system 
specifications. That is, if the customer specifies over- 
shoot, response time, accuracy, etcj, then the program 
must have the capability of calculating the system 
performance characteristics in this form. Third and last, 
since the analysis is to be repeated many times in an 
iterative fashion, the solution time should be a mini- 
mum. 

The analysis problem is now defined mathematically 
by letting S, Y, and X be vectors, defined in general as: 

System Specification Vector 

S = [Sj, Si, 

System Performance Vector 

Y = [Yi, Y 2 , . • ., YJ 
Component Parameter Vector 

X = [Xi,X 2 , , (1) 

where 

k = number of performance specifications 

n = number of component parameters 

S t - — numerical value for the i th specification 
(1 < i < k) 

i 

Y t = system performance function corresponding 
to i th specification (1 < i < k) 

X, = numerical value for j th component param- 
eter (1 < j < n) | 

Thus one can write in general that 


"Y, “ 


"Fi(Xi,X* X„ ---,X n ) "I 

y 2 


F*(Xx, X,, X„ --.,X n ) 

. Y*_ 


. F*(Xx, X*, X,, • • X») _ 


where the F’s represent the functions that need to be 
programmed to provide the system analysis. It is 
only necessary, at this time, that thje X vector contain 


the elements as required to calculate the system per- 
formance function vector Y. However, it is convenient 
to include the component costs as part of the X vector 
[even though they will not appear explicitly in (2)] since 
they are required to calculate the optimization function 
that is introduced later. 

Thus (2) can be used to calculate the system per- 
formance vector (Y) given any component vector (X). 
By programming this equation as presented, one ob- 
tains the desired analysis program except for one 
deficiency. That is, due to manufacturing tolerances, 
the X vector varies from unit to unit, and we are in- 
terested not in a particular value of Y but what spread 
or limits to expect. The tolerance effects can be in- 
cluded by using either the Monte Carlo or M'oment 
Methods. 1 - 2 The latter technique is used in this paper 
since it also provides information that is extremely use- 
ful in minimizing the system cost. 

The Moment technique makes use of an expansion 
of the function about the mean parameters using a 
Taylor series. The higher order terms of the series are 
neglected. This requires taking the partial derivative 



Figure 2 — Computer aided design program flow chart 
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of each performance variable with respect to each 
component parameter. Assuming that the component 
performance parameters are independent and noting 
that the dY i/dXj = 0 if X ; is a component cost, the 
mean value of Y t is given by the equation 



^Yi — (Hxi Mjf2, 



1 

/r / Y <Y r 

dY 

2 

“1 

- J, 

/ (o-x.) + 

L dXi J L 

. (<Tx2) dX 2 

r 

] 

dY ir 


+ ... 

+ [ (**.) 

3X» ‘ 


where i = 1, 2, • • •, k and the partial derivatives are 
evaluated while all other parameters are held at their 
mean value. As can be seen from (4), the use of the 
Moment method requires that we calculate the partial 
derivatives of each system performance function with 
respect to each component parameter. The matrix 
of these partials is the Jacobian. 

d(Y lf Y 2 , - Y*) 

j 

d(X h X 2> • • • , X n ) 

dYi dYy dY x 

dX\ dX 2 dX n 

dY 2 dY 2 dY 2 

dXi dX 2 dX n 


dYk dYk dYk 

dXi dX 2 dX n J (5) 

The entries in the Jacobian are obtained numerically 
by programming (s) and using a subroutine to make the 
following steps: 

1. Set all the X/s equal to their mean value (jx F .), 
and the calculated Y vector is taken to be the 
mean value /x F . 

2. Xi is replaced by (n xi + AX*) and the cor- 
responding value of Y is calculated with all other 
X’s at their mean value. From this, we obtain 
the first column of the Jacobian matrix using 




dYi Y i - — fiyi 

csd — for i = 1,2, • • • , k and j = 1 

dXj AXj 

3. Step 2 is repeated for each X ; for j = 1, 2, 
• • •, n thereby obtaining the complete Jacobian 
matrix. 

Development of computer optimization design 
procedure 

Use of the computer-aided design procedure described 
in the previous section, although many times more 
effective than any manual method, nevertheless repre- 
sents only a passive use of the digital computer. That 
is, the engineer makes all the design decisions and the 
computer only serves as a fast calculator. The next 
logical step toward optimized design is to use the com- 
puter to determine how the components should be 
varied to converge on the desired minimum cost system. 

Figure 2 illustrates in general how a computer could 
be used in a dynamic sense. The prerequisite to design 
is to input the data for all components. This is accom- 
plished by loading in the component data cards pre- 
punched in a prescribed format. This need be done only 
the first time and thereafter only if that data is to be 
changed; e.g., updated. These data are then stored by 
part number in an easily retrievable form on magnetic 
disk and are referred to as the “component libraries.” 
In order to provide the mainline design program with a 
guide as to part number selection, some ordered array 
of these is desired. This is accomplished by using a 
“search matrix library,” the precise working of which 
is explained later. Thus, immediately after generation 
of the component libraries, the computer calculates 
the component search matrices and stores these in a 
second block of data — the search matrix library. Now 
the program is ready to be used. The designer inputs 
the system specifications, fixed production labor costs, 
and any initial set of components of his choice. The 
latter item could be made a random selection if desired. 
In either event, the computer retrieves the component 
data from libraries and proceeds to calculate the system 
performance. The component parameters are then per- 
turbated one at a time and the partials of each system 
performance function with respect to each component 
parameter are determined. Once this is completed the 
partials are stored in the form of a Jacobian matrix. 
The calculated performance limits are then compared to 
the specification limits. The fraction of the units pro- 
duced that statistically fall outside of the specification 
limits is then calculated as the “rejection ratio.” From 
this rejection ratio, the fixed labor cost, and the summa- 
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tion of the parts cost, the total cost is calculated. A 
printout is then made so that the user can follow the 
steps that a computer makes. Following this, some 
method must be employed to determine if cost is a 
minimum. If it is, then a final printout can be made. 
If it is not, then an option is shown as to how one wants 
to optimize. This can be accomplished by the user 
reading in another set of part numbers or the computer 
automatically can select a set in the manner described 
in a later section using the search matrix library. This 
procedure is repeated in an iterative manner until the 
optimum design is reached. 

Generation of object functions 


“fixed” he maximizes his profits. Using this minimum 
cost philsophy, an appropriate object function can be 
generated in the following manner. 

The total cost to build a given number of systems 
is represented by the equation 


Total = Number "Labor 
Cost Built Cost 


+ 23 Component" 

Costs 


Overhead" 

Ratio 


(7) 


However, the number that must be built for a given 
contract is given by 


The first question that must be answered in an 
optimization problem is, “What is to be optimized 
and what is optimum?” Often, this is not a trivial 
problem in itself since there are many separate and 
usually conflicting factors; i.e., minimum cost, maxi- 
mum accuracy, small volume, best response, etc. These 
factors may be considered simultaneously be defining 
a scalar P of the form 

P = i: A<(Y< - D,) 2 (6) 

where 

P = object function to be minimized 

k = number of desired properites 

A; = weight factor selected to give the i ^ property 
the desired priority 

Y i — current value of i th property 

D i = desired value for i th property 

A serious difficulty inherent in this approach, however, 
consists in finding a set of weighting factors Ai, A 2 , 
• • • , A* such that scaling between the various terms is 
properly considered in order to maintain sensitivity 
and obtain good convergence. Considering properties 
such as accuracy, weight, cost and response, these 
weight selections often become subjective in nature. 

It is proposed in this paper that &n entirely different 
object function shall be used. It is founded on the com- 
petitive philosophy that the manufacturer wants a 
design that fulfills the customer requirements at mini- 
mum overall cost. With this result, ihe can either maxi- 
mize his chances of competing or if his sale price is 


Number Required 

Number — 

Built fi- Rejection "1 

[ Ratio J (8) 

Thus, we have for the total cost 

Number Required 

Total 

Cost f"l — Rejection "I 
|_ Ratio J 

[ Labor + 2^ Component"! , Overhead 1 
Cost Costs J L Ratio J 

(9) 

Since the number of required units and (1 + overhead 
ratio) are product terms which are not functions of the 
components, one obtains the same cost minimizing 
set of components using the function 

Labor + 23 Component 
Cost Costs 

Cost = (10) 

_ Rejection 
Ratio 

Equation (10) is the object function used for what is 
defined later as “the fine search mode.” When it is at a 
minimum, the desired optimum set of components has 
been defined. However, one problem may exist in the 
early portion of the iteration cycle. That is, the design 
can be so far away from specification that, for all 
practical purposes, the rejection ratio is unity, the 
denominator of (10) goes to zero, resulting in infinite 
cost. As long as this occurs, (10) has no practical value. 
In fact, one loses all sensitivity in calculating partials, 
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and there is no way of telling if one design is better 
than another. For this reason, a “course search mode” 
is defined. Its corresponding object function is : 

Q = Z A*R,(Y, -S,) 1 (11) 

where 

*Q = object function to be minimized 

k = number of specifications to be met 

A, = weight facgor for i th specification 

R i = rejection ratio for i th specification 

Y i = calculated system performance 3 sigma 
limit corresponding to i th specification 

S< = i th specification limit 


L12 lm JLk2 

- 1 -/, i ••/, 


La L 2 i L fcl 

f n, Y2 f ••*, rk( yi, y2, * • y*)dyi dy 2 • • • dy* 

(12) 


where: 


(Ln = — 00 1 

< > for the i tn specification an upper bound 
(L»2 553 Si * 

/Lii = Si ) , 

< > for the i tn specification a lower bound 

lL i2 = oo ) 


The joint density of the Y’s is given by: 
fn, Y2, * * * yfc(yi> y?, • • • y &) 


It should be further noted that 

Y i = Mr* — 3a Yi if Si is a lower limit, and 


HXY - Y)M f -*(Y - Y)r] 
(2ir)*/ 2 VR 


(13) 


Y i = +3(7Fi if Si is an upper limit. 


where : 


Since Equation (11) is used only in the coarse search 
mode, selection of the weight factors is not too critical. 
For this study, Ai was set at 1/Si 2 except for the case 
when Si equals zero and then Ai was arbitrarily set 
equal to unity. 

In the coarse search mode, cost is neglected in an 
attempt to determine the performance such that the 
rejection ratio becomes less than unity. The incorpora- 
tion of the Ri term in (11) greatly aids in the accom- 
plishment of this condition. First it nulls each term in 
the summation which represents an overdesigned con- 
dition (i.e., R» = 0) and secondly it applies a linearily 
increasing weight on the others according to their 
significance. 

Once each of the R/s is driven less than unity, the 
cost becomes finite, and the optimization process is 
switched from the coarse to the fine search where (10) 
is used as the object function. 

Calculation of rejection ratio 

The total rejection ratio R is the probability of a 
design falling outside of the specification, and assuming 
that the specification limits are constant, it is given by 


(Y — Y) = [(y 2 — mvi) , (y — nn), • • •, (y h — My*)] 

and the (k X k) covariance matrix M y is 

M r = JM x J r (14) 

Since the component performance parameters are 
assumed independent and <r X i — 0 if is a component 
cost, one can, write the component covariance matrix 
M* as 



In order to evaluate R using (12), one must evaluate 
the multiple integral of dimension k. This can be ac- 
complished using numerical techniques, however, the 
process is very time consuming. In the interest of 
minimizing computer time, one of the three alternate 
procedures listed in Table I are best implemented. Each 
of these approximations requires calculating only the 
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individual specification rejection ratios (R* for i = 1, 
2, • • • , k) which are given by 


1 V y — Mri 'V 2 

2 * ' 


Ri = 1 

\/27r0>i 2 


dy (16) 


Equation (16) can be evaluated by using the standard 
error function 


ERF(z) = 0“ w2 du 

y/ 7 r 


using the relationships summarized Table II. 

Since the upper bound approximation is always on 
the safe side, it is the one used here. However, the in- 
dependent approximation does lie between the two 
extremes and thus might be closet to the actual cases 

Object function derivatives 

It is of necessity that the partial derivatives of the 
object function be calculated in .the steepest ascent 
method of optimization. If these derivatives were 
somehow known for the direct search technique, it 
would be of advantage since onej could then conduct 
exploratory moves in descending order of importance. 
In our case, it would be a major task to perturbate each 
of the component parameters again and calculate the 
resulting change in the object function to obtain the 
partail derivatives. It is shown, however, that these 
can be obtained directly from the Jacobian matrix 
which is already available from the tolerance calcu- 
lations; namely, Equation (5). this is accomplished 
in the following manner as derived first for the fine 
search and then for the coarse search. 

The object function used in fine search, Equation 
(10), can be written as 

C(X) = [K + f(X)] [1 - R(X)]- 1 (18) 

where 

X = component parameter vector [Xi, X 2 , • ■ • , 

XJ 


C(X) = total system cost 
K = labor cost 
R(X) = rejection ratio 
f(X) = 2Z component cost 

Taking the partial derivative of C with respect to 
Xi and expanding to include all X, 


dC 

dC 

dC 

dx/ 

dX 2 ’ 

Cb | 

X 

s 1 


df df 


1 — R(X) I dXi dX ; 


K + f(X) I" dR dR 

_| , , 

(1 - R(X)) 2 dXi dX 2 


Expanding the dR/dX vector interms of the Jacobian 
defined by (3) one obtains the desired matrix equation 
for the fine search cost derivative vector as 


dC 

dC 


dC -| 




dXi 

dX 2 ’ 

* ) 

dX„-> 







1 

" df 

df 

df 




— R(X) 

_ dXi 

dX 2 ’ 

X 

1 


_L 

K 

+ 

1 ^ 

1 X 

\ - 

dR 

dR 

dR 


I 

(1 - 

- R(X)) 2 

_ aYi’ 

e* 

dY k 


where : 


«_ _ n 

* Y . VO 


dYi dYi “ 
dXi dX„ 

dY* dY k 
dXi dX n J 


1 if Xi is a component cost 
0 otherwise 



Directed Library Search to Minimize Cost 587 


Table I — Estimates of total rejection ratio (R) where 


Upper Bound 

Lower Bound 

Independent 

k k 



I R i if l \ < 1 

i=l 1 i=l 

R. where R. < R. 

J J “ 1 

■ - flh) 

1 otherwise 

for all 1 <_ i <_ k 

i=l 


and the vector 


dR 

dR 

dR 

dY l ’ 

1 K* 

1 ^ 


Si = i th specification limit 

Mr* = mean value of Y * distribution 

(T Y i = standard deviation of Y* distribution 

and the + sign is taken if S* is an upper limit and the — 
sign is taken if S< is a lower limit. 

The object function used for coarse search is of the 
form [see (11)] 

F(X) = AiR 1 (X)[Y 1 (X) —Si] 2 + A,R 2 (X)[Y 2 (X)-S 2 ]* 
+ • • • + A*R*(X)[Y*(X) - Sfc] 2 (25) 


is referred to as the “rejection ratio derivative vector” 
and given the notation dR/ dY . 

The calculation of the dR/dY vector, as required for 
the fine search mode, depends on the particular equation 
used in approximating the rejection ratio R [see Table 
I]. We consider here only the case where R is approxi- 
mated by the upper bound [see Reference 3 for other 
cases]. Since in the fine search mode 

k 

£ R,- < 1 

i-1 


Following the same type of procedure, as for the fine 
search, the coarse derivative vector is found to be 


dF 

dF 

dF 

dXi 

dX 2 ’ 

’ dX n 


Ai(Yi - Si)Ri 


+ (Yi - Si) 2 


dRi 


T 


dYi 


one has 

R (upper bound) = Ri + R 2 + ■ • • + R k (22) 


- 2 


dR? 

A 2 (Y 2 - S 2 )R 2 + (Y 2 - S*) 2 

dY 2 


and since Rj is a function of Y* only for i = j 


dR (upper bound) dR* 

— _ f or i — 2, • • • , k (23) 

dYi dYi 


dRk 

A k (Y k - S k )R k + (Y* - S,) 2 

dY k __ 


and only the partials of the individual rejection ratios 
are required. 

Considering the specification limit a constant, the 
magnitude of dR t /dY* is given by the Y* density func- 
tion evaluated at the point y* = S* and the sign of 
dR if dY i depends on whether Si is an upper or a lower 
bound. That is 


1 

2 



dR % dil 

dY* \/2ir a r 2 


(24) 


:dYi dYx dYi~| 

dXi dX? dX n | 


dY 2 dY? dY 2 


dXi dX 2 dX n 


(ifl) 


dYfc dY k dY k 

dXi dX 2 dXn _ 
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Equation (26) gives the desired partial derivatives 
of the coarse search object function with respect to 
each component parameter in the system. Again, like 
(20), it is in terms of the already available Jacobian 
matrix and no further parameter perturbations are 
required. 


Design program strategy 

The design program developed as part of this study 
has two basic operating optionsf-analysis and directed 
search. When operating with the analysis option, the 
component part numbers required for each analysis 
may be either read in from cards or selected at random 
by the program. In either case 1 , as many consecutive 
runs are made as requested and a final printout is 
provided summarizing the best design obtained. Thus 
the engineer can make a rapid ^Valuation of a selected 
number of designs of his choosing, or, he can perform 
Monte Carlo runs by letting the computer select the 
part numbers at random. 

With the directed search option, the computer pro- 
gram uses the object derivatives in connection with 
search matrices to direct the next component selection 
in an attempt to reduce the object function. This proc- 
ess is repeated in an iterative : fashion until a local 
minimum is obtained. Since there is no guraantee that 
this condition is the absolute minimum, numerous 
starting points are employed and the one with the 
lowest cost in assumed to be the best design. The 
starting points for each search may be specified by the 
user or otherwise selected at random by the program. 

The generation of the search matrices is a prerequisite 
to a directed search. A separate; search matrix is used 
along with each component library and their generation 
automatically follows each library update. These 
matrices consist of an order array of the component 
part numbers defined by 


Sn S12 * • • Sil 


Si = . S21 S22 


Sml S m 2 


where 


the number of parameters used to describe the 
i th component 


m = the number of part numbers for i th component 
stored in the library 

s n j = a component part number for 1 < n < m and 
1 > j > 1 

Each column of Si corresponds to a particular param- 
eter of the i th component and the entries of the column 
consist of all the i th component part numbers arranged 
in ascending order of the mean value of that parameter. 
That is, let the j th column of Si correspond to the k th 
component parameter of the X vector. Then siy, s i 2 y, 

• • • , s m y are chosen such that 

Xfa) < XM < X^jj < < X7(Z7) ( 28 ) 

where 

X*(s„y) signifies the mean value of the component 
parameter X& for the part number stored in 
location s w y 

In order to explain the strategy used by the design 
program to conduct a search, the following definitions 
are established. 

search — minimization process which be- 

gins with the initial set of part 
numbers and ends once a local 
minimum is found, 

base point = set of part numbers for which the 

object function is less than that 
calculated for any previous set of 
part numbers in a given search. 

sub-search = that part of a search which takee 

place between successive base 
points. 

exploratory move = a set of part numbers which are at 
least tentatively being considered 
for a system performance analyst. 

failure = an exploratory move which is ana- 

lyzed &nd the object function 
obtained is greater than (oy equal 
to) that of the base point. 

success — an exploratory move which is less 

than that of the base point. 

local minimum — the object function corresponding 
to the base point which remains 
once all the exploratory moves 
analyzed in a given sub-search 
result in failure. 
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Table II — Equations for calculating individual 
rejection ratios (R») 



Thus a search is made up of many sub-searches and 
each of the latter are in turn made up of numerous 
exploratory moves. Each exploratory move consists 
of changing one component part number while keeping 
the others fixed at the base point. Once an exploratory 
move results in “success,” the move is defined as a new 
base point and new sub -search is started. This process 
is repeated until all the exploratory moves of a sub- 
search are exhausted and no success is found. The 
base point for this last sub-search defines the local 
minimum. 

The following ten steps describe the general pattern 
of the program’s search strategy: 

1. The object function being minimized is Q 
[Equation (11)] while in the coarse search mode 
and COST [Equation (10)] while in the fine 
search mode. The program is in the coarse 
search mode as long as the total rejection ratio 
[Equation (12) or Table II] is equal to unity, 
once less than unity the program switches to 
the fine search mode. 

2. Each time a lower object function is found, 
the corresponding part numbers are stored as 
a new base point. 

3. At each new base point, calculations are made 
to establish the object function derivative vector 
using Equation (26) for the coarse search mode 
and (20) for the fine search mode. 

4. Priority and direction vectors are established 
as the bases for making exploratory moves. The 
priority vector (IPAR) consists of a re-ordering 
of the component parameter numbers (i.e., 
subscripts of the X parameter vector) such that 


d object > 

| d object > • 

■ • > | d object] (29) 

dXiPARl 

| dX-iPAR 2 

|dX,PAflm| 


where 

m, the dimension of IPAR, equals the number 
of component parameters. The direction vector 
(IDEX) is defined by 

d object 
dX MI 

IDEX,, = for 1 < II < m (30) 

d object 
dXu 

Thus 

IDEX,, = +1 if the II th parameter should be 
increased 

= —1 if the II th parameter should be 
decreased 

in order to achieve a reduction in the object 
function. 

5. A “sub-search progress number,” denoted by 
the symbol II, is used by the program as the 
subscript for the IPAR and IDEX vectors. 
It is initialized equal to unity (i.e., II = 1) at 
the beginning of each sub-search and incre- 
mented under program control as the sub-search 
progresses. As II is increased from one to m, 
IPAR,, corresponds to the component param- 
eter numbers having decreasing sensitivity 
values with respect to the object function. 
Likewise, IDEX, PX/Z „ corresponds to the desired 
direction the IPAR,, parameter is to be 
changed. 

6. Each exploratory move is initiated by calling a 
subroutine, named SEARCH, to select the new 
part number which is to be investigated. This 
is accomplished using the statement: 

CALL SEARCH 

[lDEX IPARir IPAR„, IPN,,,, IBOUND] 

where 

IDEX,p AP7/ = direction IPAR,, parameter is 
to be changed 

IPAR,, = parameter number for change 
being considered 

IPN jjj = present part number on en- 
tering the subroutine and on 
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7. 


return it is the new part 
numbet to be used 

IBOUND = 0 unless present part number 
is already at the boundary and 
cannot ! be changed further, 


then it 


JJJ 


is set to 1 by the 


subroutine 

component library number 


The SEARCH subroutine takes the IPAR// 
entry which corresponds to the subscript of the 
X vector and seeks the corresponding column of 
the appropriate search matrix. This column is 
then searched until the currently used part num- 
ber is found (IPN JJV ). On<Se this occurs the sub- 
routine increments either down or up one lo- 
cation depending on whether IDEX is + 1 or — 1 
and replaces the old part number with the new 
one found. If the old part number happens to 
be on a boundary such that a new part number 
cannot be obtained, the subroutine sets IBOUND 
to 1 and returns with the old part number. 
If this occurs, no further; minimization can be 
obtained considering the IPAR// parameter, 
therefore one returns the j part numbers to the 
base point and increments to the next most 
significant parameter by increasing the sub- 
search progress number (III) by 1 and step 6 
is repeated. 

For each new component Selected by SEARCH 
a library subroutine, named LIBR, is called to 
retrieve the corresponding parameter data. 
This is accomplished by thej statement 

CALL LIBRflPN,^, XMAX, XMIN] 
where 


IPNjv/ = part number for which data is desired 

XWMA — a vector containing the mean +3 sigma 
values for the total X parameter vector 

XMIN = a vector containing the mean — 3 sigma 
values for the total X parameter vector 

The LIBR subroutine takes the part number 
(IPN /JV ) and searches the appropriate component 
library stored off-line on magnetic disk, until the 
part number is located. Once located its associ- 
ated parameter data is read back and inserted 
in the proper locations of the XMAX and 


XMIN vector. Thus by calling the LIBR sub- 
routine with a part number, one is able to auto- 
matically update the three sigma limits for the 
X's corresponding to that part leaving the 
others unchanged. 

8. After the new data is obtained for the exploratory 
move, the program checks for the existence of 
two conditions before the system performance 
is evaluated. The first is used to control the extent 
that the program explores changes based on a 
given parameter before it moves on to the next 
parameter. This is accomplished by calculating 
a normalized distance (DIST) according to 


DIST = 


XMIN* 

XMAXS* 


for IDEXi > 0 


XMINS; 

XMAX, 


for IDEX* < 0 


(31) 


where i = IPAR// 


XMAXS = a vector containing the 
mean +3 sigma values for 
the total X parameter 
vector for the base point. 

XMINS = a vector containing the 
mean —3 sigma values for 
the total X parameter 
vector for the base point. 

This normalized distance is then compared to a 
program input parameter XNN. For XNN > 
1, one is assured that the X IPARII random variable 
has been varied so that its frequency distribution 
inside the 3 sigma limits lies outside the 
distribution for the corresponding base point 
parameter. Thus by selecting the value of XNN, 
the program user can control the extent to which 
exploratory moves are made. A value of XNN - 
1.5 was found to give satisfactory results. By 
making XNN larger one explores more possi- 
bilities at the expense of increased computer 
time. Thus, for DIST < XNN the program 
returns the part numbers to the base point, 
increments to the next most significant param- 
eter incrementing the sub-search progress 
number by one, and returns to step 6 above by 
calling SEARCH. If DIST < XNN, the pro- 
gram continues to make the second check. 
This second check consists of calculating the 
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estimated change in the object function based 
on its first derivative vector using the equation. 

m d object 

Aobject = 2 

« dXi 

[XNOMi - XNOMSt] (32) 

where XNOM and XNOMS are the mean com- 
ponent parameter vectors corresponding re- 
spectively to the exploratory part number 
vector and the base part number vector. Since 
the i = IPAR/ r term in (32) is negative, one 
knows that if Aobject turns out to be positive, 
the summation of the changes caused by the 
parameters in IPN other than IPAR/j have 
resulted in an estimated increase in the object 
function. Since an increase in Aobject is unde- 
sirable, one returns to step 6 above, when 
Aobject >0 and calls SEARCH keeping the 
same sub-search progress number (II). If 
Aobject <0, a complete system performance 
analysis is made using the exploratory move 
part numbers. 

9. If the exploratory move turns out to be “a 
success” (i.e., the object function is reduced) 
one returns to step 2 above and the process 
is repeated. If it is “a failure” (i.e., the object 
function isn ot reduced) one returns to step 6 
and the next exploratory move is investigated. 

10. The optimization procedure terminates once 
all the exploratory moves made from a given base 
point are completed “without success.” This 
base point defines the local minimum. 

Figure 3 summarizes the described design strategy 
in the form of a flow chart for the computer program. 
For simplicity sake, only the logic fundamental to 
the directed search option is included. 

Automated design example 

Application problem 

The example presented here is the automated de- 
sign of an instrument servomechanism consisting of a 
follow-up device, electronic amplifier, drive motor with 
feedback generator, and geartrain. A pictorial diagram 
showing a fixed system configuration using these com- 
ponents is shown as Figure 4. 

It is assumed that a design of this configuration must 
meet up to five preassigned specifications in the areas 



Figure 3 — Directed search basic program logic 


Table III — System specifications 


Name 

Symbol 

Boundary 

Units 

Static accuracy 

s i 

upper 

degrees 

Resolution 

S 2 

upper 

degrees 

Velocity lag 

S 3 

upper 

degrees 

Follow-up rate 

S 4 

lower 

deg/sec 

Damping ratio 

S s 

lower 

- 


of damping, accuracy, and time response, Table III 
lists the specifications by name and vector notation, 
tells whether each specification is an upper or lower 
bound, and the units used. 
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Figure 4 — Schematic diagram 6f motor-generator 
instrument servomechanism 

Four component libraries are established to list the 
part characteristics as follows : 

a. Follow-up — 25 part numbers 

b. Amplifier — 50 part numbers 

c. Motor-generator — 25 part numbers 

d. Geartrain — 25 part numbers 

Even though the size of each demonstration library was 
purposely kept small, the number of theoretical possible 
candidate systems is large; namely, 25 X 50 X 25 X 
25 = 781,250. 

The optimum collection of components is defined as 
“the one that satisfies the given specification in a man- 
ner resulting in minimum total cost.” 

Component libraries and search matrices 

The design equations corresponding to the five 
specifications are listed in Table IV [see Reference (4) 
for their derivation]. By grouping the parameters 
shown in Table IV according to component and adding 
the corresponding component cost, one obtains the X 
parameter vector as summarized in Table V. 

In addition to specifying any desired combination 
of the above described five performance requirements, 
the user must a’so define the load that the servo is to 
drive. For the example program developed, the load 
is represented by an inertia (J^) and a coulomb friction 
(T^). These are shown as X 2 i and X 22 of Table V. 

The components selected to make up the libraries for 
this study, chosen so as to provide a broad base of de- 
sign, are typical of those used throughout the servo- 
mechanism industry. An example of the parameter 
used is shown in Table VI which j consist of the values 
follow-up component library. 

Each column of the library data is 7 abeled with the 
appropriate X-vector notation; i.e., Xi, X 2 , ••• , X 20 , 


Table IV — System design equations 


Name 


Symbol 


Equation Used 


Static Accuracy 


Resolution 


Velocity Lag 


Follow-up Rate 


Damping Ratio 


0 + - Ean + Ka g E « n + E s + (V T «J E c 
f K f K af Vaf K f K af Vaf^s 


E (T +T ]E 

+ L g c 


Vaf Vaf^s 


, N 2 fB +K K T /e 
m g ag s/ c 

L K fWc 

e m f~ t +t 1 

f L‘-ferJ 


e. + Yl 


sat 

E 


n 2 (b +k k t /e , 

l m g ag s' c 1 




each of which is assumed to be a random variable with 
a normal distribution defined for each component by 
the mean d= 3 sigma limits given by the MAX 
and MIN values shown. The variables X^ for i = 1,4, 
9, 16, 17, and 20, which are the individual component 
costs, motor rated voltage and the gear ratio and have 
no manufacturing tolerance, are still treated as ran- 
dom vairables” but having zero variance; XMAX< = 
XMINi. 

The search matrices are generated immediately after 
the library data is stored in the computer system. The 
search matrix for the follow-up is shown as Table VII 
and consists of the follow-up component part numbers 
arranged in an ordered array. 

Computer solution 

In order to demonstrate the application of the pro- 
gram in its most comprehensive form, a customer re- 
quirement is assumed which makes use of all five speci- 
fications. The particular set is : 

1. Static accuracy = 0.35 degrees 

2. Resolution = 0.3 degrees 

3. Velocity lag for 300 deg/sec input = 5 degrees 

4. Follow = up late = 300 deg/ sec 

5. Damping ratio = 0.5 

The assumed labor cost is $200. 

The results obtained using the program in the direct 
search mode now are illustrated in detail for three 
searches. The first, shown in Table VIII, is a case where 
the initial guess fails completely to meet three out of 
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Table V — Component vector notation for library 


COMP 

VAR 

PARAMETER NAME 

SYMBOL 

UNITS 


X 1 

Cost 

C f 

dollars 


x 2 

Gain 

K f 

volts/rad 

0 

W 

X 3 

Accuracy 

9 f 

minutes 

U 





P 





A 

M 

X 4 

Cost 

C 

a 

dollars 

P 

L 

X 5 

Gain to Followup 

K . 
af 

volts/volt 

I 

F 

X 6 

Gain to Generator 

K 

ag 

volts/volt 

I 

E 

X 7 

Output Saturation Level 

E + 
sat 

volts 

R 

X 8 

Output Null Voltage 

E 

an 

volts 


X 9 

Cost 

C 

m 

dollars 

G 

E 

X 10 

Stall Torque 

T 

s 

oz-in 

M N 

0 E 

X 11 

No-Load Speed 

0 

m 

rpm 

T R 

0 A 

X 1 2 

Inertia 

J 

m 

gm-cm 2 

R T 

X 1 3 

Starting Voltage 

E 

5 

volts 

0 




R 

X 1 4 

Generator Gain 

K 

g 

volts/1000 rpm 


X 1 5 

Generator Null 

E 

gn 

millivolts 


X 16 

Rated Control Voltage 

E 

c 

volts 



Cost 

C 

g 

dollars 

A 

R 

X 1 8 

Inertia 

J 

g 

gm-cm 2 

T 

R 

X 1 9 

Friction 

T 

g 

oz-in 

A 

I 

X 20 

Gear Ratio 

N 

— 

N 





L 

0 

X 2 1 

Inertia 


gm-cm 2 

A 

D 

X 22 

Friction 

T 

Z 

oz-in 
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Table VI — Followup library data 


Table VII — Followup search matrix 


X VECTOR NOTATION 



x i 

X 2 

X 3 

PART NO. 

COST 

FOLLOWUP 

GAIN 

ACCURACY 


DOLLARS 

(VOLTS/RAD) 

(MIN OF 

ARC 1 



MAX. 

min . 

MAX. 

MIN. 

1001 

300.00 

23.60CC 

21.4000 

1.0 

0.0 

1002 

24.00 

12.7CCC 

10.3000 

10.0 

0.0 

100 3 

3 5.00 

24. POCC 

2C.2000 

7.0 

0.0 

1004 

20C.00 

0. 505C 

C.495C 

30.0 

0.0 

1005 

600.00 

0 .502' 

0.4975 

10. 0 

0.0 

1006 

28.00 

24.80C0 

20. 200C 

15.0 

0. 0 

100 7 

40.00 

12. 100C 

10.9000 

3.0 

0.0 

1008 

3 6. 00 

12. 100C 

1C.900C 

7.0 

0.0 

1009 

22.00 

12.70C0 

10. 3000 

15. C 

0.0 

1010 

30.00 

0. 5050 

C.495C 

120.0 

0.0 

10 1 1 

95.00 

11.700C 

11.3000 

2.0 

0.0 

1012 

90.00 

0.5050 

0.4950 

60.0 

0.0 

1013 

300.00 

0 • 5050 

C . 49 5 C 

15.0 

0.0 

10 1 A 

60. 0C 

24. 890C 

20.2000 

3.0 

0.0 

1015 

16.00 , 

12. 7C'TC 

1 0.300C 

30.0 

0.0 

1016 

30.00 

25.9000 

19. 100C 

10. 0 

0.0 

1017 

260.00 

1 1 .7000 

11.3000 

1 .C 

0.0 

1018 

150.00 , 

23. 60CC 

21.41000 

2.0 

0. 0 

1019 

20. 00 

27.00CC 

18.0000 

30.0 

0.0 

1020 

28.00 ! 

0. 515C 

0.5i05C 

180.0 

0.0 

102 1 

2 6.00 

5. 5CCC 

4.5:000 

10.0 

J.O 

1022 

30.00 

5.25C0 

4. 7500 

5.0 

c.o 

1023 

20.90 

5.5CC0 

4.5000 ! 

15.0 

0.0 

1 024 

2 8.00 

5.2500 

4. 7500 i 

7 . C 

'■'-.0 

102 5 

18.00 

5.500C 

4. 5000 

30*0 

0.0 


the five required specifications, thus resulting in an 
infinite cost (shown as **** when cost > 1. x 10 6 
dollars). Each line represents an analysis run and lists 
the cost (10), scalar (11), total reject [upper bound of 
(12)], the four component part numbers used, and the 
individual specification rejection percentages [R < 
using (16) for i = 1, ••*, 5]. Fifty-five iterations 
are required by the program to minimize the scalar 
object function to the point where the cost be- 
comes finite and the program switches from the coarse 
to the fine search mode. It should be noted that for 
this and subsequent computer runs, the intermediate 
printout is eliminated for all iterations where the 
scalar (cost when in fine search) is not reduced. These 
are considered “failure iterations” as is the case for 
numbers 2, 6, 7, etc., for the coarse search in Table 
VIII. 

Once the program is in the fine search mode, the 
cost is minimized up to run number 202 where it is 
reduced from $38,261.30 to $374.27. As shown, an ad- 
ditional 23 iterations are required according to the 
termination procedure, as explained in an earlier sec- 
tion, in order to establish that part numbers 1009, 
2003, 3002, and 4014 establish a local minimum. 

Table IX illustrates the results obtained from the 
second search. This case represents the opposite condi- 
tion where the initial guess at first hand looks like a 


COST 

K-F 

THETA 

1015 

1010 

1017 

1025 

1005 

1001 

1025 

1004 

1018 

1019 

1012 

1011 

1009 

1013 

1014 

1002 

1020 

1 007 

1021 

10 24 

1022 

1006 

1021 

1024 

1024 

1025 

1003 

1020 

1022 

1008 

1022 

1023 

1021 

1010 

1015 

1005 

1016 

100 7 

1 0C2 

100 3 

1011 

1 016 

1008 

1009 

1013 

100 7 

1017 

1006 

1014 

1008 

1023 

1012 

10 02 

10C 9 

1011 

1014 

1015 

1018 

1006 

102 5 

1004 

1003 

1 01 9 

1017 

1018 

1004 

1013 

1016 

101 2 

100 1 

1001 

1010 

1005 

1019 

1020 


“reasonable design”; i.e., the rejection is only 0.77 
percent. However, after 74 iterations in the direct 
search mode, the cost has been reduced from the 
original design value of $555.30 to only $374.27 — 
a savings of $181.03 per unit! The computer run time 
was less than one minute. 

The third search is shown in Table X where this 
time the initial parts result in a design which fails com- 
pletely to meet four out of the five specifications. After 
55 iterations, the program has reduced the scalar from 
59,610,000 to 3.396 and only one specification remains a 
complete failure; however, this point turns out to be 
a local min ; mum and no further reduction is obtained. 

A total of 15 searches was made and the local 
minimums found and their frequencies are summarized 
in Table XI. Based on the results listed in Table XI, 
the system obtained using part numbers 1009, 2003, 
3002 and 4014 is assumed to be the best design. The 
final computer printout sheet summarizing this com- 
bination is shown as Table XII. 
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Table VIII — Directed search with initial guess undersigned 



********** 0 EG 

N SEARCH 

NUMBER 2 

********* 




M 

RUN 




COMPONENTS SELECTED 

******* I fsjQivJDUAL 

REJECTIONS***** 

c 

D 

NO. 

CIST 


uimi 

FOUR 

AMP 

MOGEN 

GRTR 

STATIC 

RES 

LAG 

FURATE 

DAMP 

F 

1 

********* 

8. 722 E +02 

ICO. oc 

1005 

20 4 6 

3015 

4022 

TTTT ." cc 

Tiro 7^0 

TW 7 TT 

ffTCTT 

or 

l 

3 

********* 

7. 147E+C2 

1^0. oc 

1005 

2046 

3012 

4022 

100 . oc 

too . 00 

100.00 

0.00 

1.09 

2 

4 

********* 

3 • 0426 +02 

ICO. 00 

1005 

2C46 

3021 

4022 

100. OC 

100.00 

100.00 

0.0 

100.00 

2 

5 

* ******** 

8.662E +C0 

100. cc 

1021 

2 046 

3021 

4022 

18.45 

82. 34 

0.0 

0.0 

100.00 

2 

8 

********* 

8.6iee+ec 

1CO.OC 

1024 

2046 

3021 

4022 

4.18 

82.88 

o.c 

0.0 

100.00 

2 

12 

********* 

8. 611E +00 

100. oc 

1022 

2046 

3021 

4022 

0. 90 

82.88 

0.0 

0.0 

100.00 

2 

16 

********* 

5. 330C+00 

190.00 

1322 

2046 

3021 

4024 

0.01 

21. 95 

c.o 

0.00 

100.00 

2 

1 R 

********* 

4.497E +00 

100. OC 

1022 

2046 

3012 

40 24 

55.55 

100.00 

0.0 

0.03 

100.00 

2 

34 

********* 

4. 3 3 4 £ + 0 0 

100. 00 

100 7 

2046 

3C12 

4024 

0.0 

O.CO 

c.o 

0.03 

100.00 

2 

35 

********* 

3.359E+C0 

ICO. 00 

1 0 C 7 

2C46 

3 °23 

4024 

o.co 

63.32 

0.20 

56.16 

94.76 

2 

40 

********* 

3. 29 1 E +09 

100. oc 

190 7 

2046 

3005 

4 C2 4 

o.co 

74. 21 

27.87 

88.46 

55.68 

2 

41 

********* 

3. 207 F +00 

100. cc 

10C 7 

2046 

3011 

4024 

0.00 

21.45 

10.67 

54.19 

6.75 

2 

55 

38261. 30 

3.222E+0C 

98 . 76 

ion 

2046 

3011 

4 02 4 

0.00 

21.17 

16.88 

54.19 

6.52 

2 

76 

2939. 5C 

3. 213E+00 

84. 65 

10)1 

2046 

3011 

4023 

0.0 

11.20 

16.30 

51.22 

5.93 

3 

109 

793.85 

4. 742E+00 

23.32 

lCll 

2025 

3011 

4023 

0.0 

0.0 

0.00 

28. 32 

0.0 

3 

119 

565.04 

4. 254F +00 

9.0 1 

1CW 

202 8 

301 1 

4020 

o.c 

0.0 

0.0 

0.01 

0.00 

3 

120 

544. 01 

5.276F +00 

0.00 

101 1 

2 r< 2 5 

3002 

4 02 0 

0.0 


0.00 

0.00 

0.00 

3 

122 

537. 01 

5 . 3 4 9 E +90 

3 . 0C 

101 1 

2025 

3002 

4014 

0.0 

0.0 

0.00 

0.00 

0.00 

3 

127 

527.05 

2. 8 96 £+01 

0.01 

101 1 

2C 4 1 

3002 

4014 

0. 0 



0.0 

0.0 

3 

133 

52 7. 0C 

1. 636E+C1 

0.0 

1011 

2030 

3002 

4014 

0. 0 



0.0 

0.0 

3 

136 

522.01 

6.4-94E+0 0 

0. 00 

1 C 1 1 

2033 

3002 

4014 

0.0 



0.00 

0.00 

3 

149 

487. 0C 

5. 766E +00 

0.00 

101 1 

2 04 8 

3002 

4014 

0.0 




0.00 

3 

159 

432. or 

5.694E+00 

0.00 

1007 

2048 

3002 

4014 

0.0 

0.0 

0.0 

0.00 

0.00 

3 

162 

423. 0C 

5. 450E +00 

0.00 

loc-e 

2048 

3002 

4014 

0.0 

0.0 

0.0 

0.00 

0.00 

3 

172 

426. 94 

5. 6 96 F +00 

1.16 

1016 

2048 

3002 

4014 

0.0 

0.0 

0.0 

0.00 

1.16 

3 

178 

424. ?e 

5. 522E+30 

1 . Cl 

1 006 

2C48 

3002 

4014 

0.00 

0.0 

0.0 

0.00 

1.01 

3 

181 

4 16. 0C 

5.293E+00 

0. oc 

1 0C 2 

2048 

3C0? 

4014 

0.00 

0.0 

0.0 

0.00 

0.00 

3 

1 84 

414. 0C 

5. 1 44 E + 00 

9 . 

1009 

2048 

3002 

4014 

0.00 

0.0 

0.0 

0.00 

0.00 

3 

202 

374.27 

6 . 973 F + 00 

1.41 

1CC9 

2C0 3 

3002 

4014 

0.00 

0.0 

1.40 

0.00 

0.00 

3 

2 2*5 

374.27 

6. 973E+00 

1.41 

1009 

2 C03 

3002 

4014 

o.co 

0.0 

1.40 

0.00 

0.00 

4 


Table IX — Directed search with initial guess overdesigned 


j **********££G 

IN SEARCH 

NUMBER 1 

********* 




M 

RUN 



COMPONENTS SELECTEO 

*******INDIVIDUAL 

REJECTIONS***** 

0 

D 

NO. 

COST 


D2B31 

FOUP 

AMP 

MOGEN 

GRTR 

STATIC 

RES 

LAG 

FURATE 

0 AMP 

E 


$55. 3C 


0.77 

1CC6 

2050 

3C16 

4013 

0.04 

0.0 

0.73 

0.00 


1 


550.65 

4.030E+00 

0.48 

1 0C6 

2 05 0 

3002 

4013 


0.0 

0.48 

0.00 


II 


547.11 

4.751E+0C 

0.02 

1006 

2050 

3002 

4020 


0.0 

0.0 

0.0 


H 


544.01 

3. 936E+00 

0.00 

1006 

2C50 

3002 

4009 

0.00 

0.0 

0.0 

0.00 

0.00 

3 

6 

540. 1C 

4. 821E+00 

0.C2 

1006 

2050 

3002 

4014 

0.00 

0.0 

0.0 

0.0 

0.02 

3 

12 

535.89 

4. 537E+00 

0. 17 

1002 

2050 

3002 

4014 


0.0 

0.17 

0.0 

0.00 

3 

26 

466.38 

4.98CE+00 

o. ce 

1002 

2025 

30C2 

4014 


0.0 

0.08 

0.00 

0.00 

3 

27 

464.79 

4.890E+00 

0. 17 

1009 

2025 

3002 

4014 

0.02 

0.0 

0.15 

0.00 

0.00 

3 

36 

454.88 

2.802E+G1 

0. 19 

1009 

2041 

3002 

4014 

0.00 

0.0 

0.19 

0.0 

0.0 

3 


454. 0C 

l. 550 E +01 

0.0r. 

1009 

2C 30 

3002 

4 014 

0.00 

0.0 

0.0 

0.0 

o.c 

3 


449.01 

5. 740E+00 

0.00 

1009 

2033 

3002 

<*014 

0.00 

0.0 

0.0 

0.00 

0.00 

3 


4 14.00 

5. 144E+00 

o.oc 

1009 

2048 

3002 

4014 

O.OC 

0.0 

0.0 

0.00 

0.00 

3 


374.27 

6.973E+C0 

1.41 

1 0C9 

2003 

3002 

4014 

0.00 

0.0 

1.40 

0.00 

0.00 

3 

Wm 

374.27 

6.973E+0C 

1.41 

1 CC9 

2003 

3002 

4014 

o.co 

0.0 

1.40 

0.00 

0.00 

4 


The validity that the above $374.27 local minimum 
is also the absolute minimum can be checked, for this 
example, by using the procedure explained as follows: 
The lowest possible cost for a system made up of any 
collection of components is the summation of the in- 
dividual component costs and the labor cost since if 
there are rejects, they only increase this cost. Therefore, 


to test if a local minimum is also the absolute minimum, 
one need analyze only the subset of the total combina- 
tion for which 

labor cost component costs < local minumim (33) 
If it turns out that analyzing each system in this subset 
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Table X — Directed search resulting in an unsatisfactory local minimum 


*********«BEG! n search number 


RUN 

NO. COST 


3 ********* 


TED ******* individual 
RTR STATIC RES LAG 


REJECTIONS***** 
FURATF DAMP 



r 

********* 

5.961E+C7 

100. OC 

2 

********* 

4.616E +07 

100. 00 

3 

********* 

4.013F+07 

100.00 

5 

********* 

2.078E+07 

100. OC 

7 

********* 

1.919E+07 

100. OC 

8 

********* 

1.812E+07 

100. 00 

10 

********* 

1.094E+07 

100.00 

■n 

********* 

6.726E+06 

100. OC 

12 

********* 

6.678E+06 

100. OC 

13 

********* 

5.485E +06 

109. OC 

14 

********* 

5.276E+06 

100.00 

15 

********* 

5.687E+04 

100.00 

16 

********* 

8.648E+01 

100. CO 

■ 

********* 

3.746E+00 

100.00 

■ 

********* 

3.434E+00 

ICO. 00 

SI* | 

********* 

3 • 399E+00 

ico.oc 


********* 

3. 396E+Q0 

100. OC 

R3 

********* 

3.3 96E +00 

100.00 


1013 2014 
1013 2014 
1013 2014 
1013 2014 
1013 2014 
1013 2014 
1013 2014 
1013 2014 
1013 2014 
1013 2014 
1020 2014 
1024 2014 
1024 2C14 
1024 2014 
1024 2C14 
1022 2014 
1022 2006 
1CZ2 2006 


3010 4017 

3014 4017 
3009 4017 

3004 4017 

3016 4017 

3002 4017 

3003 4017 

3011 4017 

3005 4017 
3023 4017 
3023 4017 
3023 4017 

3012 4017 

3017 4017 

3015 4017 
3 Cl 5 4017 
3015 4017 
3015 4017 



Table XII — Best design obtained using directed search 


AUTOMATED DESIGN RESEARCH PROGRAM 
JANUARY 15, 1969 
****DEF INITION OF LOAD**** 

MAX MIN 

INERTIA ( GM-CMSQR ) 9.001E+02 7.000E+02 

FRICTION (OZ-IN) 8 . 000E-01 4.000E-01 

****PART NUMBERS OF COMPONENTS SELECTED**** 

FOLLOWUP AMPLIFIER MOTOR-GEN GEAR TRAIN 
1009 2003 3002 4014 

**** PERFORMANCE**** 

MAXIMUM 
4*0 71E *03 
4.578E+03 
6*,169£ *0 1 
4.895E+01 
3.0336-01 
5.4576-02 
5. 29 5E *00 
9.7026+02 
2. 22 7E+00 


****COST SUMMARY**** 

1.41 PCT REJECTION IUPPER BOUNDI 
1.41 PCT REJECTION ( INDEPENDENT 1 
1.40 PCT REJECTION (LOWER BOUND) 

200.00 LABOR COST 

169.00 PARTS COS1 

374.27 TOTAL COST (USING R-UPPFR BOUND) 

E SIGNIFIES CONVENTIONAL POWER-OF-TFN NOTATION 


MINIMUM SPEC LIMIT PCT RFJ 

2. 9286+03 TOTAL INERTIA (GM-CMSQR) 

2.056E+03 TORQUE CONSTANT (OZ-IN/RAD) 

2.7 17E+0II DAMPING COEFFICIENT (OZ-IN-SECI 

3 • 2 18E *01 NATURAL FREQUENCY (HERTZ) 

4.71 2E-02 0.350 0.00 STATIC ACCURACY (DEG) 

2.666E-02 0.300 0.0 RESOLUTION (DEG) 

3 . 091E*0C 5.000 1.40 LAG FOR 300. DEG/SEC RAMP (DEG) 

6.1926*02 300. 00C 0.00 FOLLOWUP RATE (DEG/SEC) 

1.233E*00 0.500 0.00 DAMPING RATIO 
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Table XI — Local minimums obtained for design 
example 


f 



Component Part Numbers 


p > 

e J. 

P3 

■a 

1 


100% 





1 

- 

100% 




K ' 

1 

$410.01 

0.25% 

1023 

2008 

3006 

4014 

1 

$394.63 

1.43% 

1009 

2003 

3002 

4002 

1 

$394.29 

0 . 07% 

1009 

2012 

3006 

4014 

9 

$374.27 

1.41% 

1009 

2003 

3002 

4014 

1 

Search terminated as iterations exceeded maximum allowed 
of 300 


results in a total system cost higher than the local 
minimum being investigated, the latter is the absolute 
minimum. 

For the above $374.27 local minimum there are 
17,835 combinations which satisfy (33). This number 
although large is much less than the 781,250 total pos- 


sible combinations and it becomes a practical value 
when one considers the solution time. The 17,835 
combinations were, therefore, analyzed (at a cost of 
1.7 hours of computer time compared to 74.4 hours for 
a complete exhaustive search) and each resulted in a 
total system cost > $374.27 thus proving the latter 
to be the absolute minimum. 
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Computer-aided design for custom 
integrated systems 


by W. K. ORR 

The Singer Company-Friden Research Center 
Palo Alto, California 


INTRODUCTION 

The computer-aided design (CAD) system described 
herein was developed to aid in the design of digital 
systems to be implemented by custom integrated cir- 
cuits (CIC) and multi-chip hybrid custom integrated 
systems (CIS). The terms MSI/LSI are avoided here 
due to the general confusion which exists in the litera- 
ture as to what constitutes an MSI/LSI circuit. The 
CAD system philosophy is that each CIC is imple- 
mented from a selected set of “library elements”. 
This design approach results in some size inefficiencies, 
compared with manual designs, but provides many 
advantages, of which flexibility and a shortened de- 
sign cycle are the most important. This CAD system 
captures fundamental design information in a machine- 
readable form early in the design process, thus maxi- 
mizing potential computer assistance and minimizing 
costly and time-consuming errors. This paper contains 
an overview of the complete CAD system, highlighting 
its more distinctive features. The complete system 
has been operational on a 360/30 for several months, 
and specific experiences with it can therefore be dis- 
cussed. 

Overview 

Following are the major sections of the complete 
CAD system, and the distinctive features to be dis- 
cussed more fully in later sections. 

Logic design 

These programs convert a description of the logical 


function required of a CIS into the corresponding 
functional logic. 

Distinctive features include : 

1. Logicspec, a special register-transfer source 
language, 

2. Compiled functional logic independent of hard- 
ware implementation, 

3. Designer control of factoring and gathering. 

Logic simulation 

These programs provide a complete simulated en- 
vironment for the CIS, and a bit-simulation of response 
to input pattern sequences. 

Distinctive features include : 

1 . Random access and cyclic memory, 

2. Read-only-memory, 

3. Time-dependent and conditional input signals, 

4. Logic level statistics, 

5. Selective output facilities. 

Logic conversion 

These programs convert the functional logic to the 
logic family selected for hardware implementation, and 
create the design data-base. 

Distinctive features include: 

1. Efficient NAND/NOR logic generation, 

2. Wired-OR 
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Logic element design 

These programs facilitate origination and revision 
of the library elements used in final system imple- 
mentation: 

Distinctive features include : 

1. Graphospec, a special graphic source language, 

2 . Logic element library, 

3 . Artwork generation facility , 

Partitioning 

These programs enable the designer to explore al- 
ternative partitionings, and post the final locations 
of all logical elements to the design data-base. 

Distinctive features include : 

1. Minimization of total pad-count for the CIS, and 

2. Extensive designer/ computer interaction. 

Element selection 

These programs select the smallest eligible element 
meeting all the circuit requirements. 

Distinctive features include : 

1. Automatic insertion of gate expanders and intra- 
CIS pads, and 

2. Capabilities for handling variable size elements. 

Element interconnection 

This program establishes the X-Y interconnection 
routing. 

Because of the nature of this paper, references are 
not cited in the text, instead an annotated bibiography 
is given at the end of the paper. 

Logic design 

The initial input to the CAD system, as shown in 
Figure 1, consists of a set of Logicspec statements. 
Logicspec is a language which his been developed to 
simplify the task of describing a logic design in machine- 
readable form. 

The Logicspec language permits the designer to 
avoid many of the burdensome details of logic design 
These details are filled in by the Logicspec Translator, 
which converts a Logicspec description into a com- 
plete set of design equations. These design equations 
are essentially Boolean equations* the operators being 
AND, OR and NOT. However, they are written in a 
modified form of polish notation. In this notation the 
equation 1 

a=bcd+ef 



Figure 1 — Computer-aided design system for CIS 


appears as 

A = ((B C D •) (E F •) +) 

An important characteristic of this notation is that 
each operator corresponds to a gate in an AND/OR 
implementation of the equation. This greatly simplifies 
those programs in the CAD system which must operate 
on these equations. 

Since the Logicspec language is similar to other 
register transfer languages which have been proposed, 
only some of its more distinctive features will be 
discussed here; a full description will be published 
elsewhere. 

Flip-flops are the only memory elements dealt with 
directly in a Logicspec description. Memory systems 
such as core and delay line memories are treated as 
systems interfaced to the logic design through signal 
lines. The description of these memories is deferred 
until simulation, where the simulator controller governs 
the manner in which the various memories interact 
via the signal lines, with the logic design. 
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Flip-flops are introduced in a description through 
the use of a Flip-Flop Collection declaration such as 

FFC 12 A(1,8*),B(1*,8) 

The foregoing indicates that the collections A and B 
both contain eight type-12 flip-flops. The “*” identifies 
the high order end of the collections for decoding ref- 
erences such as “A = 2”. The type code (“12”) is used 
by the simulation system to determine how the as- 
sociated flip-flops are to be simulated and by other pro- 
grams to determine how the flop-flips are to be imple- 
mented. This information is contained in an on-line 
disk library which can be expanded as required. Each 
flip-flop declared may have a maximum of five input 
and two output terminals : 

A(l)/R, A(l)/S, A(l)/T, A(l)/P, A(l)/C, A(l) 
and A(l)'. 

The functions of these terminals is determined by 
the information contained in the corresponding library 
entry. 

The bulk of a Logicspec description consists of a 
set of statements which specify that if a certain con- 
dition C is true then an action S, or set of actions Si, 
S w occur. 

The statement form actually used by the designer 
is the more concise conditional statement : 

IF C THEN S 

or when several actions are involved : 

IF C THEN BEGIN S t ; S 2 ;...;S n END 

The actions prescribed may include such operations 
as SET A, CLEAR B, C —> D (transfer C to D), and 
INHIBIT TX. Conditional statements can be nested, 
i.e., S i could be another conditional statement. The 
condition C may be any Boolean expression formed 
using the operators + (OR), .(AND) and '(NOT). 
It is permissible to describe an entire design using only 
Boolean equations; one need not use conditional state- 
ments if he so desires. 

Most designers who have used the system feel that 
the conditional statement is rather cumbersome, and 
generally prefer to use an alternate form referred to 
as a qualification statement. This statement takes the 
form: 

*C: 


All subsequent statements are conditioned by C 
until another qualification statement occurs which 
overrides the condition C. To illustrate this consider 
the following: 

FFC 12 A(l*, 3), B(l*, 2), C(l*, 2); 

*(A = 3): 2 — > B; 

CLEAR C; 

*(A = 0): B — » C/S ; 

The modified Polish equations produced by the Logic- 
spec translator for the above description are : 

C(l)/C = *(A = 3)* 

C(l)/S = (*(A = 0)*B(1)/1 • ) 

C(2)/C = *(A = 3)* 

C(2)/S = ((*A = 0)* B(2)/l • ) 

B(l)/R = *(A = 3)* 

B(2)/S = *(A - 3)* 

*(A = 3)* = (A(l)/0 A(2)/l A(3)/l • ) 

*(A = 0)* = (A(l)/0 A(2)/0 A(3)/0 • ) 

Qualification statements may be nested using a form 
of subscripting: 

*C X : S i; 

S 2 ; 

*1C 2 : S 8 ; 

S 4 ; 

*1C 3 : S 6 ; 

*C 4 : S 6 ; 

In the above, S x and S 2 are conditioned by C x , S 3 and 
S 4 by C x - C 2 , S 6 by C x -C, and S 6 by C 4 only. Logic- 
spec is a free-form language, thus the identations above 
are for documentation only. 

The structure of a Logicspec description contains 
important “clues” which are used by the translator 
to produce efficient logic. As an example, the majority 
of common control conditions are described using 
qualification statements. Referring to the above, C x 
is a common control condition in that it controls the 
actions Si, and S 2 and in conjuntcion with C 2 , S 3 and 
S 4 . The Logicspec translator searches all qualification 
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statements for such common conditions, and may 
either duplicate the gates Involved every time the 
condition is used or generate a hew signal which is 
used wherever the condition appears. This decision 
is under the control of the designer, who specifies the 
minimum number of times a condition must be used 
before a new signal is generated. The designer can also 
control the generation of new signals based on how the 
condition is used and the number of gates required to 
generate the condition. 

The designer can use the flexibility described above to 
reduce the time required to simulate a design by in- 
structing the translator to generate a new signal for 
every common condition. This generally reduces the 
number of gates in a design and thus the gate evalua- 
tion time during simulation. 

The basic Logicspec language Is very simple, but 
means are provided for extending the language through 
the use of subsystem definitions. A subsystem definition 
for the four bit ring-counter pictured in Figure 2 is 
given in Figure 3. 

In Figure 3, line three is a signal collection declara- 
tion for the single rail bus OUT (double rail bus declara- 
tions begin with SIGC/2). Line; four indicates that 
the words COUNT and SETO are to be added to the 
basic Logicspec vocabulary whenever a RINGC is 
used. Lines five and six simply describe fixed connec- 
tions. 

Once a subsystem has been defined and added to 
the subsystem library, the designer may use it in one 
of two ways — he may INCLUDEI it or simply SIMU- 
LATE it as part of his design. j 

The INCLUDE option specifies that the actual 
text describing the subsystem is to be passed to the 
Logicspec translator and processed along with the 
text describing the rest of the design, in much the same 
way as a macro call functions in programming languages. 

The SIMULATE option makes the logical descrip- 


OUT I OUT 2 OUT 3 OUT 4 



I 4 

Figure 2 — Four bit ring-counter 


tion of the subsystem available for simulation purposes 
only — the rest of the logic design must interact with 
the subsystem through its input/output terminals. 
The subsystem logic does not become part of the sys- 
tem being designed: subsystem simulation information 
is passed directly to the simulation program, and is 
not processed by the Logicspec translator. 

The same subsystem may be included and simulated 
in the same design. For example 

INCLUDE RINGC A(AO), B(BO) ; 
SIMULATE RINGC C(CO); 

indicates that two ring-counters, A and B whose out 
put buses are AO and BOi respectively, are to be in- 
cluded in a design whereas C is only to be simulated. 

The efficiency of the logic produced by the Logic- 
spec translator has been evaluated, using designs for 
two systems which were in production before Logic- 
spec was developed. These two systems were described 
in Logicspec, processed through the translator, and 
the resultant logic compared against that in the pro- 
duction systems. In both cases the logic produced 
by the translator contained five percent more gates 
than the production designs. 

Logic simulation 

The electronics industry increasingly uses logic 
simulation to elimate logic design errors before com- 


DEFINE RINGC (OUT); 

0) 

FFC 12 A (1*, 4); 

(2) 

SIGC OUT (1*, 4); 

(3) 

OPERATION COUNT, SETO; 

(4) 

A OUT; 

(5) 

A(4)' -^A(l); 

(6) 

*SETO: CLEAR A; 

(7) 

*COUNT : SHRA; 

(8) 

END; 

(9) 


Figure 3 — Subsystem definition for ring-counter 
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mitting a design to hardware. Many designers, how- 
ever, insist on building breadboards to isolate lead- 
length and other circuit problems. In some cases, this 
is still a valid position. However, whenever the product 
will utlimately use CIC's a breadboard serves only 
to correct logic errors, simply because of the difference 
between the breadboard and final product technologies. 

The creation of a logic simulation program begins 
with the simulator ordering program. This program 
orders the design equations, in preparation for the 
simulator compiler which produces the simulation 
code. The equation order, Ei, E 2 ,..., E n , produced 
by the ordering program has the following property: 
the variable defined by equation, E,-, is a function of 
flip-flop outputs, system inputs (external inputs) or 
variables which have been defined in the preceding 
equations E^.^E*-!. In addition a level list is produced 
which gives the number of gate delays in the definition 
of each signal. This list is used by the designer to iso- 
late signal paths which contain excessive delays. These 
may be eliminated by changing the Logicspec descrip- 
tion. 

Whenever an equation occurs which defines a signal 
as a function of itself the program will fail to order it. 
At the completion of the ordering process a list of all 
unordered equations is produced. The designer must 
change his description such that every equation can 
be ordered before proceeding to the simulator compiler. 
From this the reader may wonder how flip-flops built 
from cross-coupled gates (latches) are processed. The 
answer is that the designer uses a flip-flop which has 
the characteristics of a latch, but he does not write 
the equations which describe the latch itself. 

The simulator compiler generates code to evaluate 
each equation in the order specified by the ordering 
program. One pass through this code may represent 
one simulated clock time; the equivocation is clari- 
fied by the discussion of the simulator controller. 

The simulator controller simulates all memory ele- 
ments in a given CIS design, monitors various signals 
to find predesignated error conditions, and applies 
time-varying input signals so as to provide a realistic 
simulation of the environment in which the CIS must 
operate. A set of powerful commands has been de- 
veloped to facilitate the designer’s interaction with 
the simulator, and to maximize the information he 
receives about the simulation results. Concise state- 
ments are provided for describing wave forms which 
are to be applied to the machine's inputs (system in- 
puts). Commands are provided to control the display 
of selected signals and flip-flops during simulation, 
as well as the status of any delay line or core memories 
involved in the design. 


The flip-flop control procedure used by the simu- 
lator controller is outlined in Figure 4. A pass through 
the simulation code will define each signal and flip- 
flop input. If any asynchronous (non-clocked) flip-flop 
changes are required the controller makes these changes 
and another pass is made through the simulation code 
to propogate the effect of these changes. The controller 
counts the number of times recycling is required be- 
tween clock times. If this count exceeds a limit specified 
by the designer, an error message is generated, thus 
permitting detection of any oscillating conditions 
which may be present in a given design. When there 
are no more asynchronous changes, a clock time is de- 
fined and all clocked flip-flop changes are made. This 
procedure for handling asynchronous flip-flop changes 
is also used to handle asynchronous changes in all other 
types of memories. 

Simulation running time is clearly increased when- 
ever asynchronous events occur. However, in the ab- 
sence of asynchronous events there is virtually no run 
time overhead associated with the capability to handle 
such events. As regards running time, a logic system 
containing 100 flip-flops and 600 gates is simulated 
at a rate of 18 clock periods per second. 



Figure 4 — Simulator controller 
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Logic conversion 

As discussed earlier, the logic produced by the 
Logicspec translator consists of a set of Boolean equa- 
tions. Generally our logic is implemented in either 
NANDS or NORS, thus the design equations must 
be converted to one of these logic families. 

The Logic Conversion Program is a one pass, table 
driven program capable of converting the design 
equations into either NANDS or NORS. When strap- 
ping (OR-tieing) is permitted, the program will use 
it when it yields a savings in gates and/or logic levels. 

One of the unique features of this program is the 
order in which it converts the design equations. The 
conversion produced for the i th equation can be done 
efficiently (in terms of the number of gates required) 
only when it is known how the signal defined by this 
equation has been used — positively' negatively or both. 
In other words, to produce an efficient conversion for 
equation i one must first produce a conversion for 
each equation which uses the signal defined by equa- 
tion i. On the surface this seems like a difficult problem, 
at least a time consuming task, however, as it turns out 
all of the necessary information is produced by the 
ordering program used in simulation 

Recall that the simulator ordering program produces 
the design equation ordering Ei, Eg,...,E n , where every 
signal in equation E* has either been defined by a 
preceding equation or is a flip-flop output or system 
input. The conversion program Converts the design 
equations in the order E n , E n _!,...,Ei. That is, the first 
equation converted is the one which appears at the 
end of the list produced by the simulator ordering 
program. 

As the conversion is done, the! program maintains 
a “usage list” which indicates how each signal has 
been used. As an example, if the equation A = B + C 
is converted to NANDS the program records the fact 


that B and C have been used negatively, since the 
NAND conversion for this equation is A = B @ C, 
where @ represents a NAND gate. Thus, we see that 
when the program reaches equation E t - the usuage list 
entry for the singal V< defined by E *, contains all of the 
information as to how V, has been used. Returning to 
the previous example, if B was used only in the equation 
which defines A then the conversion program would 
produce an equation for B rather than B. 

The table used by the conversion program to con- 
vert the design equations to NANDS, assuming a 
strapping capability, is shown below. This table is 
somewhat simpler than others which have appeared 
in the literature. 

The entries in Table I give the NAND gate replace- 
ments for each Boolean operator as a function of the 
polarity that is required at a given level in the logic 
network. The “positive”, “negative” entries which 
appear in the table are the polarities required on the 
inputs to the gate(s) which replace the Boolean op- 
erator. “Strap” implies that under the indicated con- 
ditions strapping may be used. Whenever the NOT 
operator occurs, it is simply removed with the indicated 
polarity reversal. 

Figures 5a and 5b illustrate how the conversion 
table is used. In Figure 5a the implication is that a 
conversion is to be produced for H rather than Hjthus 
the first conversion table access is made with (Polarity, 
Boolean operator) = (POSITIVE, AND). 

To insure that conversion is done correctly, the 
designer must supply a list of the system inputs and 
outputs with their required polarities. In addition, 
he must specify the polarity required at each flip-flop 
input. 

Figure 7 shows a conversion produced for the design 
equations given in Figure 6. The symbol $ is used to 
indicate strapping. Each operator, @/$, is followed 


TABLE I— NAND conversion table. 


POSITIVE 


NEGATIVE 


Boolean Operator 


AND 

OR 

NOT 

@@ (STRAP) 

@ 

ELIMINATE 

POSITIVE 

NEGATIVE 

NEGATIVE 

@ 

@@ (STRAP) 

ELIMINATE 

POSITIVE 

NEGATIVE 

POSITIVE 
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CONVERSION 



1 Z = (((BA.) C(l)/1 +)' D(l)/l .) 

2 A = ( (L K .)M +) 

3 G = ( M N +) 

Figure 6 — Design equations 

1 *M'* = ( M @10) 

2 *N'* = ( N @11) 


Figure 5a— H = (A • B + C • I) + K • F) • G 



by an operator number. The signal Z, which appears 
as *Z'* in Figure 7, was produced rather than Z be- 
cause this signal was described separately to the con- 
version program as a negative polarity system output . 

Generally when AND-OR-NOT logic is converted 
to NAND or NOR logic, additional levels are intro- 
duced. The designer normally will pass the logic pro- 
duced by the conversion program back through the 
ordering program to determine if excessive logic levels 
have been introduced. If there are excessive levels, 
the designer must eliminate them by changing the 
original Logicspec description. 

To facilitate further processing, implementation 
equations such as those of Figure 7 are compacted into 
a file which resembles a wiring list. This file, referred 
to as the design Data Base, it used by all subsequent 
programs. 

Logic implementation 

Following logic conversion, artwork must be gen- 
erated to produce the CIS which implements the logic 
contained in the design data base. In part this involves 
the selection of an IC equivalent for each gate and 


3 (1) *Z'* = ( ( ( B A @4) ( C(l)/1 @3) $2) D ( 1 )/ 1 @1) 

4 (2) *A‘* = ( ( L K @9) ( M @8) $7) 

5 A = ( *A'* @ 6) 

6 (3) G = ( *M'* *N’* @5) 

Figure 7 — Implementation logic 

flip-flop in the data base. The central information 
source used in establishing these equivalences is the 
Element Library. Since this library is used by all sub- 
sequent programs, it is appropriate to introduce it at 
this time. 

The library elements important for the following 
discussion are gates (NAND/NOR), flip-flops, line 
drivers, and expanders. Although the Element Library 
contains a much broader range of digital elements, the 
CAD system is presently only capable of utilizing 
these simple logic elements to implement a CIS. The 
effectiveness of the CAD system will increase as the 
complexity and variety of library elements that can 
be used to implement a CIS is increased. 

The library entry for each element contains all of 
the information required to produce the artwork for 
the several mask levels for the given element. This in- 
formation is stored in a disc file in relocatable form so 
that an element may be positioned at any location on 
a chip in one of four possible rotations, and optionally 
as a mirror image. Dimensions, fan-in and fan-out 
capabilities and logic type are included in each ele- 
ment entry. 

A complete set of programs accomplishes element 
library maintenance. Most important of these are 
the programs which the element designer uses in the 
creation and modification of library elements. Ele- 
ments are generally built up in a bootstrap fashion. 
Resistors, diodes, and transistors are described to the 
system, in a special Graphospec language, as a col- 
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lection of rectangles. In this language the description 
of a rectangle consists of the coordinates of one ver- 
tex, the length of the associated diagonal and a mask 
layer designation. More complex elements such as 
gates are described as collections of these elements. 
There is virtually no limit to the complexity of the 
elements that can be built up in this fashion. 

The computing equipment currently available at 
the Research Center for use in CAD does not include 
a graphic display terminal. In ahticipation that one 
will be available in the future the Graphospec language 
was designed for use on such a terminal. 

Partitioning 

The logic contained in the data base, representing 
that to be implemented by a CIS, may exceed the 
capacity of a single IC chip. It is then necessary to 
partition the logic into groups (partitions), each of 
which can be implemented by a single IC chip. It is 
important to note that partitioning is accomplished 
before IC equivalents have been selected for each gate 
and flip-flop in the data base. One reason for this is 
that there can be a significant siz$ difference between 
gates and flip-flops whose inputs are generated and 
outputs used on the same chip and! those whose inputs 
(outputs) originate (terminate) on a different chip. 
Thus, it is not possible to know dxactly the area re- 
quired for each element until partitioning has been done . 

The approach taken toward partitioning was to 
develop a set of manipulation and ! reporting programs 
that the designer can put together to implement a wide 
variety of partitioning strategies. Some understanding 
of what is done by these program^ can be gained from 
the following brief descriptions. 

The input program 

Calculates approximate areas for each logic module. 
A given logic module consists of either a flip-flop, a 
collection of flip-flops, or the gates required to im- 
plement a design equation. As an illustration, the 
equation A = ( (B C @ 3) (D E || 2) @ 1) is treated 
as a four input logic module which has one output cf. 
Figure 8. The area of this logic module is the sum of 
the approximate areas for the gates which make it up. 

The locate program 

Places named logic modules on specified chips. Ad- 
ditionally, the designer can specify that a module is 
to be locked in place, that it cannot be moved from its 
designated location by subsequent programs. The 
name of a logic module is defined- to be the name of 



Figure 8— Logic module defined for partitioning 
purposes 


the output signal (the name of the logic module in 
Figure 8 is A). Flip-flop outputs bear the name of the 
flip-flop. 

The randomize program 

Randomly distributes all logic modules which have 
not been placed over the chips which the designer 
designates as available. The designer can elect to 
begin partitioning with any number of chips. 

The weld program 

Creates a new logic entity by associating any 
specified set of logic modules together. For example, 
one might weld the reset logic for a flip-flop collection 
to the flip-flop collection itself. 

The reduction program 

Moves logic modules, or logic module sets, between 
chips whenever a move will result in a reduction in 
the total number of interconnection pads required 
within the CIS. Moves are made subject to the area 
and pad limitations the designer has given for each 
chip. 

The display program 

Produces a chip interconnection table which gives 
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the name of each “back-plane” signal, the number of 
the chip which generates the signal and the number(s) 
of the chip(s) the signal is connected to. This is only 
one of the several reports designed to aid the designer 
in executing his partitioning strategy. 

Frequently an “optimum” partitioning job can be 
done only if the designer is willing to change his 
design. Gates can often be traded for pads, reducing 
system cost, also the duplication of registers, especially 
those that are extensively decoded, may reduce cost. 
The cost effectiveness of trade-offs such as these will 
of course change as packaging techniques improve, 
however, the situation will still arise when for want 
of a pad a chip must be added to a CIS. 

To simplify the task of implementing design changes 
made to improve partitioning results, a facility is 
provided which allows the designer to obtain a “lo- 
cation deck” at any time during the partitioning 
process. Each card in this deck contains a logic module 
name and the number of the chip on which the module 
is located. The cards are punched in the format ac- 
cepted by the locate program. 

All design changes must be made to the associated 
Logicspec description — this fundamental design docu- 
ment is always kept up to date. Once a change has 
thus been made and a new data base created the lo- 
cation deck is processed to obtain the new partitioning 
results. If design changes eliminated certain logic 
modules the associated cards in the location deck are 
rejected. Further, if logic modules were added the 
designer is required to include cards for these in the 
location deck. 

Element selection 

Chip locations for each logic module established by 
the partitioning process are posted to the design data 
base. The element selection program then selects 
the library elements that are to be used to implement 
the logic on each chip. Selection is controlled by a 
list, which the designer prepares, of eligible library 
elements. From this list, the program selects for each 
gate and flip^ilop the element of smallest area which: 

1. Provides the logic function required by the 
associated element in the data base, 

2. Has the required fan-out capability, and 

3. Has the required fan-in capability. 

Whenever there is no eligible library element with 
adequate fan-in, gate input expanders are automatically 
added. Whenever the source and destination of a 
signal are on different chips, appropriate output and 
input pads are added automatically. To effect further 


area minimization, the selection program recognizes 
special combinations of logic elements and substitutes 
corresponding special library elements. At the moment 
the only special element substituted is a dual output 
NAND. 

From this point on all processing is done on an in- 
dividual chip basis. 

Placement and interconnection 

Three layers of metal interconnections are generally 
required for the chips within a CIS. In such three 
layer systems the first metal layer is used solely for 
element mtfraconnections, and the second and third 
layers are used for element mfe? connections. Thus, 
the CIS placement and interconnection task is equiva- 
lent to the two-sided PC card placement and intercon- 
nection task. The algorithms used are modifications 
of those which have proved effective tools for generating 
PC card artwork. 

Element placement and interconnection are always 
done using the power bus, ground bus and pad layout 
prescribed by the designer. Several “standard” chip 
layouts are stored in the element library and the par- 
ticular layout specified by the designer is referenced 
by the programs as required. A typical chip layout 
is shown Figure 9. 

The CAD system can handle chips of various sizes, 
however there are certain aspects of chip layout which 
are standard from chip to chip : 

1 . Pads are located on the perimeter, 

2. Power and ground busses are on separate metal 
layers — one under the other, 



Figure 9 — Typical chip layout 
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3. The minimum horizontal dimension of the 
region bounded by two segments of a bus or 
by a column of pads and ajbus segment is C — 
the maximum must be 2C (ef figure 9). 

The CIC placement problem is complicated by 
the fact that the library elements which must be 
placed on a chip are not all the! same size. This is 
simplified somewhat by the restrictions imposed on 
chip layout and library element design. Reflecting 
the restrictions discussed regarding chip layout all 
library elements must be designed with one or the 
other of the aspect ratios pictured in Figure 10. 

Placement is accomplished in; three steps. First 
the elements and pads are placed. on a regular grid, 
assuming that all the elements are the same size; the 
particular size chosen is that of the smallest element 
which must be placed on the chip. 

Element pairs are then interchanged on this grid 
until a minimum approximate interconnection distance 
is found. Second, the elements are expanded to their 
full size into a new, initially empty, grid which actually 
represents the chip. Elements are processed one at 
a time starting at the center of the “small” grid and 
moving outward along a spiral path. For each element 
processed all possible positions on the new grid are 
evaluated with respect to three criteria: (1) the distance 
from the ideal position as defined by the small grid, 
(2) the degree of occupancy of this position by elements 
already processed, and (3) the angle of rotation be- 
tween the lines defined by the grid center and ideal 
point and grid center and position being evaluated. 
The third criterion is designed to; keep the expansion 
progressing outward from the center point. If the 
position picked as minimal with respect to the above 
criteria is partially or fully occupied, a search is 
entered to find other positions for the occupying 
elements. The third placement step is to again inter- 
change pairs of elements so as tb minimize intercon- 
nection distance, although this time only elements of 
the same size may be interchanged. 
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Figure 10 — Permissible library element aspect 
ratios 


For each chip processed, the placement program 
produces two outputs. The first includes a list of all 
of the library elements on each chip, with their abso- 
lute chip location given, this is entered in the element 
library. The second is a list of required interconnections; 
this is the input for the wiring program. 

The wiring program makes all power and ground 
connections first, using a simple heuristic. Given that 
the point x, y is to be grounded or connected ,to power, 
a bi-directional search beginning at x, y is made in 
a direction perpendicular to the two closest segments 
of the appropriate bus. If an obstruction is encountered 
during the search a turn is made perpendicular to 
the preferred search direction. When one of the bus 
segments is encountered the required connection is 
made. 

When all power and ground connections have been 
processed element interconnections are made using 
the Lee-algorithm. To speed up this process these 
connections are made in two steps. At first each pair 
of points to be connected is enclosed in a rectangle 
and the Lee-search is restricted to this enclosing rec- 
tangle. The particular rectangle chosen for a given 
pair of points is the one whose diagonal passes through 
the two points and is four units longer than the line 
joining the two points. If the program fails to make 
the connection within the enclosing rectangle the 
pair of points is added to a “failure list” and processing 
continues with the next pair. Once all point pairs 
have been processed pairs in the failure list are again 
processed; this time, however, the search area is not 
restricted. 

Resticting the Lee-search as described above, in 
some cases improves running time as much as 28 percent. 

The average density, in interconnections/ square, 
of the chips processed to date has been 3.8, where a 
square is 10 wiring grid units on a side. At this density 
manual completion has been required for less than 
1 percent of the interconnections processed. 

To facilitate manual completion the output of the 
wiring program is a card deck referred to as a connec- 
tion deck, which can be manually manipulated to 
make those connections which were not made auto- 
matically. These cards actually contain a description 
of the connections in the Graphospec language ac- 
cepted by the element library maintenance programs. 
Thus, these programs can be used to plot metal masks, 
as a basis for deciding how to make the remaining 
connections. 

This manual wiring completion procedure is a 
potential source of errors. A verification program is 
therefore provided to validate all manually introduced 
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connections. Actually this program checks all con- 
nections in the connection deck against the design 
data base and produces an error list of all missing 
and erroneous connections. When a final connection 
deck is obtained it is entered in the element library. 
At this point the element library contains all of the 
information required to produce the artwork for a 
given chip. 

Non-recurring engineering costs 

In June 1969 an experiment was performed to 
measure the non-recurring engineering costs of custom 
integrated circuit design. 

*For this experiment, a digital system whose logic 
design was already complete was chosen as a starting 
point. This system, as it existed in prototype form, 
consisted of 42 flip-flops, 215 NAND gates and 20 
NOR gates implemented in 69 conventional dual inline 
packages. 

The experiment began when the system design, in 
the form of four D-size logic diagrams, was received 
at the Research Center. Members of the CAD staff 
transformed the design into a machine-readable form 
using the Logicspec language. It should be noted that 
Logicspec was not being used as a design language, 
but merely as a means of conveying design informa- 
tion to the computer. 

Following the transformation to Logicspec a com- 
plete logic simulation was performed to identify any 
errors introduced by the manual transformation. 
Several such errors were found. In addition, two errors 
were found in the logic diagrams. 

At the beginning of the experiment it was decided 
that the CIS would be implemented using chips mea- 
suring 140 mils on a side with a maximum of 39 signal 
pads/chip. On these chips power and ground buses 


and pads occupied approximately 7,100 mils 2 of the 
available area, leaving 12,500 mils 2 for the placement 
of library elements. 

Using a parts list submitted with the logic diagrams 
it was estimated that with the selected chip size the 
system could be implemented using six chips. The 
six chip partition obtained is characterized below. 
The area utilization figures given below were obtained 
after element selection had been performed. 

The area utilization figures given in Table II clearly 
indicate that the six chip partition makes somewhat 
inefficient use of the available area. At the time it was 
not obvious that fewer chips could be used due to pad 
limitations. For this reason the experiment was com- 
pleted. using the six chip partition. Subsequently a five 
chip partition was obtained this is characterized below . 

The five chip partition required more gates than the 
six chip partition because it was necessary to trade gates 
for pads in order to stay within the prescribed pad limits. 

Following element selection each of the six chips was 
processed through the placement and wiring programs. 
Of the six chips processed only one required manual 
completion :one connection was made manually. 

The end product of the experiment was a complete set 
of rubylith mask masters (11 mask layers) for one chip 
(chip five). In determining costs it was assumed that 
the mask masters for each of the remaining chips would 
cost approximately the same. 

The professional manpower and computer costs 
required to perform the experiment are summarized 
below. At the Research Center all plotting is done in a 
multiprogramming environment (i.e., it is overlapped) ; 
for this reason the summary is broken into two parts. 
The entire experiment was completed in an elapsed time 
of three weeks. 

The non-overlapped time shown above was the time 


TABLE II — Six chip partition. 


Chip 

Gates 

Flip-Flops 

Pads 

Used 

Area 

In Mils 2 

% of 

Available 

Area 

1 

26 

9 

31 

10,762 

86% 

2 

40 

7 

38 

11,343 

90% 

3 

38 

9 

38 

11,438 

91% 

4 

32 

4 

37 

7,446 

59% 

5 

43 

6 

37 

11,052 

88% 

6 

28 

7 

36 

8,696 

69% 

Total 

207 

42 

217 

60,757 


Average 

34 

7 

36 

10,126 
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TABLE III — Five chip partition. 


Chip 

Gates 

Flip-Flops 

Pads 

Used 

Area 

In Mils 2 

%of 

Available 

Area 

1 

33 

10 

37 

12,321 

98% 

2 

44 

7 

36 

12,117 

97% 

3 

37 

10 

39 

12,136 

97% 

4 

51 

8 

38 

12,117 

97% 

5 

45 

7 

39 

12,162 

97% 

Total 

210 

42 

189 

60,853 


Average 

42 

8.4 

37.8 

12,170 



TABLE IV — Manpower and computer costs for experiment (exclusive of plotting). 



Professional 

%of 

Computer Hours 

%of 


Man Hours 

Total 

(IBM 360/30) 

Total 

Transfer Design to Logicspec 

40.5 

39% 

1.8 

7% 

Logic Simulation 

26.0 

25% 

1.6 

7% 

Convert to Nand L ogic 

2.0 

2% 

0.4 

2% 

Partition System to 6 chips 

3110 

30% 

11.0 

45% 

Library Element Selection 

0.5 

0.5% 

0.6 

3% 

Placement of Elements 

3.0 

3% 

4.1 

17% 

Interconnection of Elements 

1.0 

1% 

4.5 

19% 

Totals 

104.0 


24.2 



TABLE V — Computer costs for plotting portion of experiment. 

Computer Time 
(IBM 360/30) 

Non-Overlapped Overlapped 
(Plotting) 


Prepare Composite for Manual 

Interconnection Completion 

.3 

.8 

Prepare Mask Masters Chip 5 

3.3 

11.8 

Prepare Mask Masters for Other 

Chips (Extrapolation) 

16.5 

59.0 

Totals 

20.1 

71.6 


required to load a disk file with the information which 
was to be plotted. 

Experience at the Research Center indicates that 
approximately 2.5 nonprofessional man hours are 
required to strip and check a rubylith mask master. 
Including this the total cost for a final set of mask 
masters for the six chips is as summarized in Table VI. 


TABLE VI — Non-recurring engineering 
costs of CIS design. 


Professional Man Hours 104.0 

Non-Prof esional Man Hours 165.0 

Non-Overlapped IBM 360/30 Hours 44.3 

Overlapped IBM 360/30 Hours 71.6 
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CONCLUSION 

Development of the CAD system described herein 
required approximately twelve man years of effort. The 
system is now providing the tools which make the task 
of developing a CIS as simple as, and as regards 
non-recurring costs, no more expensive than the task of 
developing the same system using discrete IC packages 
and printed circuit boards. 
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INTRODUCTION 

From the earliest times, man has made his mark. At 
first his marks were made with his own fingers on 
walls of caves. He used a chisel or brush to create 
pictures of animals. He developed symbols, alphabet 
and languages. Man used marks to pass information 
from person to person and from generation to gener- 
ation. Through the ages, man recorded information 
to be used again and again. He recorded history, 
mathematics and law. These things brought order to 
his life. The history of civilization is the history of 
man’s ability to communicate, record and make marks. 

In making marks, there is most always a moving 
object. Man used his own fingers. Today most marks 
are made by a type slug, a print hammer, a moving 
drum, or some mechanical device. And now man has 
electronic digital computers. These machines manipu- 
late and generate information at unprecedented speed. 
Man’s need to make marks has multiplied many times 
in the past few decades. Much of the drudgery of 
handling information has been relegated to the com- 
puter. The speed of computers is so great that mechan- 
ical mark-making devices can no longer keep pace. 
Devices using a stylus or print hammer will not move 
fast enough and require too much maintenance. 

This is the beginning of our story — a new method 
for making marks — COM. 

What does COM mean? 

1. Computer Output Microfilm: microfilm con- 

* Also Director, National Microfilm Association, Annapolis, 
Maryland. 


taining data, produced by a recorder from 
computer generated electrical signals. 

2. Computer Output Microfilmer: a recorder which 
converts data from a computer into human 
readable language and records it on microfilm. 

3. Computer Output Microfilming: a method of 
converting data from a computer into human 
readable language onto microfilm. 

This paper will describe COM technology and the 
various types of COM recorders. Some of the uses and 
applications will be explored. A description of the 
various recorders and a comparison of the units will 
be made. Microfilm origination, dissemination and 
retrieval systems will be reviewed. Some COM market 
forecasts will be looked at and a survey of the field 
by the National Microfilm Association will be pre- 
sented. 

General 

Over the past several years, American industry as 
well as the scientific community have turned increasing- 
ly to the use of computers and microfilm as a means of 
controlling what is referred to as the “paperwork 
explosion.” Computers and microfilm have been 
generally used independently to cope with the same 
problem. Both have been successful, but neither alone 
has completely solved the problem. The effect of com- 
bining microfilm and the computer in a system for 
information handling may turn out to be more dramatic 
than the effect of either alone. 

Computer systems of all generations, first, second and 
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third, have been plagued by an imbalance of speeds. 
The functions of computer systems namely, input pro- 
cessing, and output — though mterwined as functions, 
have been sadly imbalanced in their speed relation- 
ships one to the other. The computer itself, or the main 
frame, has seen an ascension of speed and power of 
phenomenal proportions from the mid-195(ys to the 
present. The older vacuum tube equipment could pro- 
cess at thousandths of seconds or milliseconds. The 
transistor and solid state technology brought forth 
microseconds or a millionth of a second speeds. Finally, 
the third generation in this evolution, the micrologic 
of integrated circuits, has caused nanosecond speeds, 
a billionth of a second, to be realized. However, the 
input/output twins have seen no similar evolution. 
On the input side the basic medium of data input is 
still the EAM card which is over 30 years old. On the 
output side mechanical printing and its hardcopy paper 
medium has been the major avenue of getting the 
information to the user. 

Although there have been several major efforts to 
improve the input/output situation, and especially to 
eliminate the output bottleneck, none has succeeded 
until now. The Computer Output Microfilmer, or COM 
recorder provides the solution to the computer output 
problem. A COM recorder has the output equivalent of 
as many as 30 impact printers operating simultaneously. 
Some COM units have a transfer rate as high as 100,000 
characters per second (transfer rate : the speed at 
which information can be transferred from magnetic 
tape to microfilm). 

The COM is a device which records computer data on 
microfilm in human readable form. It is a recorder 
which may be connected directly to the computer for 
“on line” operation or to a magnetic tape unit for 
“off-line” operation. The magnetic tape unit “reads” 
information into the COM from a magnetic tape which 
previously has been recorded directly from the com- 
puter. 

There are three types of COM devices: 

Business — alphanumeric printer 

Scientific — alphanumeric printer and plotter 

Graphic Arts — special quality alphanumeric 
printer and plotter 

Recording the output of a digital computer directly 
on microfilm is not new. As early as 1955 at least one 
COM recorder was in use for this purpose. The early 
units as well as some of the new units were designed 
for scientific work. These recorders are printer-plotters; 


that is, they are capable of reducing the digital output 
of computers to convenient, usable plots and curves 
that are annotated with alphanumeric information. 
Figures 1 and 2 are typical scientific plots. This was 
the role of the COM until recent years when some of 
the scientific users began using the printing capability 
for non-scientific alphanumeric listings. 
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B 

146506 

HOO 

ACTIVE 

B 

1 400 1 T 

HOO 

ACTIVE 

B 

140033 

HOO 

ACTIVE 

B 

146557 

HOO 

ACTIVE 

B 

140030 

HOO 

ACTIVE 

B 

143034 

HOO 

ACTIVE 

B 

146556 

HOO 

ACTIVE 

B 

141012 

HOO 

ACTIVE 

B 

146042 

HOO 

ACTIVE 

B 

146561 

HOO 

ACTIVE 

B 

1 4 1 T 0 1 

HOO 

ACTIVE 

B 

146001 

HOO 

active 

B 

146563 

HOO 

ACTIVE 

B 

1 4 1 T 0 3 

HOO 

ACTIVE 

B 

146005 

HOO 

ACTIVE 

B 

146567 

HOO 

ACTIVE 

B 

1 4 1 T 04 

HOO 

ACT I VB 

B 

146007 

HOO 

ACTIVE 

B 

14667 1 

HOO 

ACTIVE 

B 

1 4 1 T 00 

HOO 

ACTIVE 

B 

1461 15 

HOO 

ACTIVE 

B 

146072 

HOO 

ACTIVE 

B 

141707 

HOO 

ACTIVE 

B 

146120 

HOO 

ACTIVE 

B 

146674 

HOO 

ACTIVE 

B 

141010 

HOO 

ACTIVE 

B 

1461 24 

HOO 

ACTIVE 

B 

146666 

HOO 

ACTIVE 

B 

141011 

HOO 

ACTIVE 

B 

146125 

HOO 

ACTIVE 

B 

146500 

HOO 

ACTIVE 

B 

141012 

HOO 

ACTIVE 

B 

146126 

HOO 

ACTIVE 

B 

146501 

HOO 

ACTIVE 

B 

14 1020 

HOO 

ACTIVE 

B 

146130 

HOO 

active 

B 

146606 

HOO 

ACTIVE 

B 

141030 

HOO 

ACTIVE 

B 

146131 

HOO 

active 

B 

146506 

HOO 

ACTIVE 

B 

141031 

HOO 

ACTIVE 

B 

146 1 36 

HOO 

ACTIVE 

B 

146602 

HOO 

ACTIVE 

B 

141043 

HOO 

ACTIVE 

B 

1461 37 

HOO 

ACTIVE 

B 

146603 

HOO 

ACTIVE 

B 

141044 

HOO 

ACTIVE 

B 

146136 

HOO 

ACTIVE 

B 

146606 

HOO 

active 

B 

141045 

HOO 

ACTIVE 

B 

146 156 

HOO 

ACTIVE 

B 

146606 

HOO 

ACTIVE 

B 

141040 

HOO 

ACTIVE 

B 

146106 

HOO 

ACTIVE 

B 

1466 14 

HOO 

ACTIVE 

2 

1 4 1 04 T 

HOO 

ACTIVE 

B 

14624 1 

HOO 

ACTIVE 

B 

1466 15 

HOO 

ACTIVE 

B 

141040 

HOO 

ACTIVE 

B 

146242 

HOO 

ACTIVE 

B 

146600 

HOO 

ACTIVE 

B 

141050 

HOO 

ACTIVE 

B 

146246 

HOO 

ACTIVE 

B 

146700 

HOO 

ACTIVE 

B 

142022 

HOO 

ACTIVE 

B 

146246 

HOO 

ACTIVE 

B 

146701 

HOO 

ACTIVE 

B 

143031 

HOO 

ACTIVE 

B 

146247 

HOO 

ACTIVE 

B 

146700 

HOO 

ACTIVE 

B 

143032 

HOO 

ACTIVE 

B 

146246 

HOO 

active 

B 

1467 1 2 

HOO 

ACTIVE 

B 

14 3030 

HOO 

ACTIVE 

B 

146240 

HOO 

ACTIVE 

B 

1467 1 3 

HOO 

ACTIVE 

B 

143308 

HOO 

ACTIVE 

B 

146250 

HOO 

ACTIVE 

B 

1467 14 

HOO 

ACTIVE 

B 

143002 

MG -HOO 

INACTIVE 

B 

146251 

HOO 

ACTIVE 

B 

1467 15 

HOO 

ACTIVE 

B 

14 0«0 

HOO 

ACTIVE 

B 

146252 

HOO 

ACTIVE 

B 

1467 16 

HOO 

ACTIVE 

B 

144153 

HO* 

ACTIVE 

B 

146253 

HOO 

ACTIVE 

B 

146744 

HOO 

ACTIVE 

B 

145050 

HOO 

ACTIVE 

B 

146254 

HOO 

ACTIVE 

B 

146757 

HOO 

ACTIVE 

B 

145000 

HOO 

ACTIVE 

B 

146255 

HOO 

ACTIVE 

B 

146759 

HOO 

ACTIVE 

B 

145332 

HOO 

ACTIVE 

B 

146256 

HOO 

ACTIVE 

B 

14676 1 

HOO 

ACTIVE 

B 

145 330 

HOO 

ACTIVE 

B 

146257 

HOO 

active 

B 

146763 

HOO 

ACTIVE 

B 

1 45 330 

HOO 

ACTIVE 

B 

146256 

HOO 

ACTIVE 

B 

146766 

HOO 

ACTIVE 

B 

145337 

HOO 

ACTIVE 

B 

146259 

HOO 

ACTIVE 

B 

146767 

HOO 

ACTIVE 

B 

145330 

HOO 

ACTIVE 

B 

146260 

HOO 

ACTIVE 

B 

146770 

HOO 

ACTIVE 

B 

145330 

HOO 

ACTIVE 

B 

1 46 26 1 

HOO 

ACTIVE 

B 

146707 

HOO 

ACTIVE 

B 

145 340 

HOO 

ACTIVE 

1 

B 146262 HOO ACTIVE 

figure Typical business information-alphanumeric* 

B 

146700 

HOO 

ACTIVE 


These non-scientific (business) applications prompted 
the development of special COM devices which are 
designed for high speed recording of alphanumeric 
computer output. These units record the same type of 
information as impact printers only they are much 
faster and the information is placed on microfilm 
instead of paper. Figure 3 shows an example of this 
type of information. Thousands of computers in use 


today do not yield full capacity. The computer systems 
are slowed down by their output devices, the impact 
printers, which produce too much paper. The mountains 
of printout they produce are smothering the very 
efficiencies for which computers were designed. These 
thousands of computers do not put vital information 
into the hands of the right people in the right places 
in time for the right decisions. These new business 
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Figure 4— Typical business report-graphics 

COM recorders can solve this problem. The problem 
is solved by the following advantages the COM system 
has over the impact printing systeni: 

1. Printing at computer tape speeds. 

2 . Forms printed with data simultaneously. 

3. Retrieval coding placed on records as it is 
created. 


4. Smaller records storage. 

5. Reduced cost of supplies and material. 

6. Weight of information significantly reduced. 

7. Microfilm doesn’t have to be decollated, burst 
or bound. 

The third type of COM is the graphic arts printer. 
This is an electronic composition system. This type 
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of recorder can produce alphanumerics and graphics 
with graphic arts quality at data processing speeds. 

The evolution of the COM is quite interesting, it 
began with the scientific device being used for plotting 
technical data in graphic form. Now it is being used 
extensively by business for alphanumerics as a re- 
placement for impact printers. I predict that business 
management will quickly realize that they too would 
have great advantages from the scientific type of 
system and have their business information plotted 
and presented in graphic form instead of as alphanumer- 
ics or having a draftsman manually prepare charts from 
alphanumeric data. Figure 4 shows a business report 
produced by a scientific COM recorder. 

Technology 

Speed 

The most obvious technological advantage of a COM 
is the speed at which computer information is translated 
into human readable form on microfilm. It is difficult 
to visualize or appreciate this speed. I, therefore, 
present these comparisons: 

5,000 30 1 

electric = impact = COM 
typewriters printers recorder 

Looking at it another way: 


Characters 

Lines 

Lines 

Pages 

per sec. 

per min. 

per hr. 

per hr. 

Typewriter 15 

7 

400 

6 

Impact Printer 2420 

1100 

66,000 

1031 

COM recorder 70,000 

32,000 

1,900,000 

30,000 



Cathode ray tube (CRT) systems 

The Computer Output Microfilmer, as the name 
implies, produces computer generated microfilm records 
with no intervening paper copy. This is achieved by 
converting the computer digital signals to voltages 
which are applied to a cathode ray tube. (Another 
method, electron beam recording, will be described 
later.) This conversion process results in the infor- 
mation being displayed on the cathode ray tube screen 
in human understandable form. The microfilm record 
is produced by photographing the information dis- 
played on the cathode ray tube. The basic nature of 
this process is illustrated in Figure 5. 


Figure 5 — CUT microfilm recording 

Electron beam recording (EBR) systems 

The second method of recording directly on micro- 
film uses an electron beam, see Figure 6. Using the 
stroke method, much like that of a pencil writing on 
paper, the electron beam writes a latent image directly 
on dry -silver microfilm. The electron beam originates 
at the cathode of the electron gun, located on the top of 
a sealed housing. Electrostatic plates and electro- 
magnetic yokes, or magnetic lenses, deflect the beam 
to form characters and position them on the micro- 
film frame. The housing is similar in principle to a 
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Figure 6 — Electron beam recording on microfilm 

cathode ray tube except in place of a phosphor screen 
it has a small aperture through which the beam passes 
to write directly on the microfilm. Vacuum pumps 
reduce the air pressure within the housing to a level 
low enough to facilitate generation and precise control 
of the beam. Because the beam] has practically no 
inertia, it can be deflected, or modulated, rapidly 
enough to keep pace with the data transfer rates of 
the tape drive. 

Character generation 

There are several methods of creating characters for 
COM recording. Stromberg-Datagraphix has de- 
veloped a special cathode ray tube balled a Charactron®* 
Shaped Beam Tube. The Charadtron tube creates an 
image by directing an electronic beam through indi- 
vidual characters cut in a matrix — a thin precise disc 
with alphanumeric and symbolic characters etched 
through it. This matrix is located within the neck of 
the tube. This method extrudes the beam into the shape 
of the character being printed. This has the effect of 
stenciling each character onto the tube face. 

Another method of creating characters is by the 
use of a “stroke” generator. In tikis type of system a 
spot is deflected to trace the shape of the character 
desired. The voltages necessary to deflect the spot 
are generated by sweep generators, one for X deflection 
and one for Y deflection. Instructions for the charac- 

* Charactron is a trademark of Stromberg-Datagraphix, Inc. 


ters are stored in memory. About 16 strokes are used 
on an average per character. 

Characters can also be created by point plotting. 
This method is generally used for special symbols or 
type faces. 

Line generation 

Line generation in scientific COM recorders is done 
by the use of a “line” or “vector” generator. This is 
known as a vector stroke generator and is capable of 
drawing vectors. Line width, vector direction and 
intensity levels are all generally programmable. 

Forms overlay 

Forms overlay features are provided on most units. 
The forms overlay feature provides the capability of 
superimposing predetermined, fixed forms with the 
generated image. Forms are interchangeable by an 
operator or on some units may be called in by program. 
These forms may contain maps, company logos, charts 
and graphs such as the one in Figure 7. 

Retrieval coding 

COM recorders can generate retrieval codes and 
patterns for each or selected frames of information. 
The following coding systems are usually standard 
features: Codeline, Image Blip (Image Count) and 
Miracode. These indexing identifiers are recorded on 
film simultaneously with the data. This feature is the 
key to push button easy retrieval of information on 
microfilm. 

Films 

There are two recording films in use today in the 
COM field. Almost all CRT systems use Kodak (Re- 
cordak) Dacomatic* film, types 5461 and 7461. The 
EBR systems use 3M Computer Film, type 761 (dry- 
silver). The Dacomatic film is available in the following 
sizes: 

a. 105mm nonperforated 

b. 35mm with perforations 

c . 16mm with perforations 

d. 16mm nonperforated 

The 3M dry-silver film is available in 16mm non- 
perforated form. 

The 105mm film is used in the business type COM 
and the film is cut and used as microfiche. The 35nun 

• Dacomatic is a trademark of Eastman Kodak. 
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Figure 7 — Forms overlay 

film is generally used for scientific work (graphics). The 
16mm nonperforated film is used for almost all business 
applications. The perforated 16mm film is only used 
for special high precision applications. 

Users and applications 

Scientific 

A few of the scientific applications are: circuits, 
printed wiring board masters, thin film masks, animated 
movies, graphs and charts. See Figures 1, 2 and 3. 

The following are some of the organizations using 
scientific COM recorders: 

North American Aviation 
NASA 

Collins Radio 

Bell Telephone Laboratories 
Lawrence Radiation Laboratories 
MIT Lincoln Laboratories 

Business 

Business applications include all types of listing; 
account reports, management reports and anything 
that might have been produced by a computer and 


impact printer. The following are some of the organi- 
zations using COM’s in business applications: 

Sears Roebuck & Company 
J. C. Penney Company 
Social Security Administration 
Equitable Life Assurance Society 
Bureau of the Census 
International Harvester Company 

Systems service centers 

At the present time there are over 40 systems service 
companies operating COM service centers in the follow- 
ing cities in the United States : 

California 

Canago Park 
Culver City 
El Segundo 
Glendale 
Los Angeles 
San Francisco 
Stockton 
Colorado 
Bolder 
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Denver 

Connecticut 

Hartford 

Westport 

Florida 

Miami 

Georgia 

Atlanta 

Illinois 

Chicago 

Indiana 

Indianapolis 

Louisiana 

New Orleans 
Maryland 
Baltimore 
College Park 
Massachusetts 
Boston 
Springfield 
Wilmington 
Michigan 
Dearborn 
Detroit 
Missouri 
St. Louis 
New Jersey 

Cherry Hill 
Dayton 
New York 

Binghamton 

Buffalo 

New York City 
Rochester 
Spring Valley 
White Plains 
North Carolina 

Winston-Salem 

Ohio 

Columbus 

Cleveland 

Dayton 

Pennsylvania 

Philadelphia 

Pittsburgh 

Texas 

Austin 

Dallas 

Houston 

Utah 

Salt Lake City 
Virginia 

Arlington 


Washington, D. C. 

Wisconsin 

Brookfield 

COM recorders 

At the writing of this paper the followng companies 
were marketing COM units : 

a. AMETEK/Straza (Scientific) 

b. Beta Instrument (Scientific) 

c. California Computer Products (Scientific) 

d. Canon (Business) 

e. Computer Micro-Data Systems (Scientific) 

f. Computer Industries (Scientific & Business) 

g. Control Data (Scientific) 

h. Eastman Kodak (Business) 

i. Information International (Scientific) 

j . 3M (Business) 

k. RCA (Graphic Arts) 

l. Scan Graphics (Scientific) 

m. Singer-Link (Scientific) 

n. Stromberg-Datagraphix (Scientific & Business) 

Total COM systems 

As can be seen in Figure 8 there is very little difference 
between photographing a CRT or a paper document. 
In selecting the film for recording from a CRT it 
should be matched to the phosphor of the tube in 
sensitivity. The polarity of the image on a CRT is 
negative (light lines on a dark background) and on 
paper it is usually positive. Therefore, with normal 
film processing the image of the CRT will be reversed 
and appear on film as a positive and a microfilm of a 
positive paper document will appear as a negative. 
Since most users of microfilm prefer to use negative 
images in readers and for making hardcopy it is neces- 
sary to obtain a negative image of the COM film, this 
is done one of two ways. At the time of processing the 
recording film is flashed and developed in a special 
movie processor which provides a negative image on 
film from the negative CRT image. The second method 
of obtaining negative film images is to make a second 
generation duplicate on Kalvar or silver film which 
will reverse the polarity again and therefore from a 
negative CRT image we get, with normal processing, 
a positive first generation recording film and then a 
negative second generation duplicate. 

Figure 9 depicts the various systems used for COM 
operations. In scientific applications the film is most 
often put in aperture cards or used as short strips or on 
reels. In most business applications the film is used in 
roll from in cartridges. There are a few systems where 
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the film is cut and pasted up tp make a master micro- 
fiche. A recent development, the 105mm film head for 
a COM provides microfiche directly and therefore 
eliminates much of the manual labor in producing 
microfiche. In most business applications film duplicates 
are required to disseminate the information to many 
users. In all systems, readers, reader-printers, retrieval 
devices and enlarger-printers are needed by the end 
users of microfilm. Additional information on these 
items can be obtained from the National Microfilm 
Association's “Guide to Microreproduction Equip- 
ment” now in its fourth edition. 

There are six generally used methods of making 
copies of computer generated reports. Figure 10 pro- 
vides a cost comparison of a 100 page report. As can 
be seen on the graph, distribution of microfilm dupli- 
cates is the lowest cost method at any quantity of 
copies. 

The COM market & NMA survey 

At the end of 1968 there were about 300 COM 


PAPER 



Figure 8 — Microfilming systems 



recorders in use. Of this number about 60 units were 
being operated by systems service companies. 

There have been many forecasts made of the COM 
field with as many conclusions as studies. Figure 11 
shows the range of these forecasts, which is that by 
1975 there will be between six and 12,000 recorders in 
operation. The cost of a COM is $60,000 to $300,000 
with the average being about $100,000. This average 
rental is in the order of $40,000 per year. 

In the Spring of this year the National Microfilm 
Association made a survey of all its over 3,000 members, 
with regard to the use of Computer Output Microfilm. 
The following are some of the statistics obtained and 
my comments : 

1. Questionnaires were returned by 24 percent of 
those queried. 

Comment : 24 percent is considered an excellent 
return on a direct mail survey. NMA 
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* Coat of microfilm retrieval equipment NOT included 

Figure 10 — Comparative costs of creating copies ol 
computer generated output 
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^ igure 11 — COM recorder unit placement forecast 


members are interested in the COM 
field. 

NOTE : SYSTEMS SERVICE COMPANIES 


HAVE NOT BEEN INCLUDED IN ANY 
OF THE FOLLOWING STATISTICS. 

2. Questionnaires were returned by 74 organi- 
zations indicating they now have a COM re- 
corder(s), 29 were scientific and 59 business units. 

Comment : 33 percent of COM’s are scientific 
type today 67 percent of COM’s are 
business type today 

3. 105 organizations indicated they would obtain 
their first COM in the next two years. 28 organi- 
zations indicated they would obtain an ad- 
ditional COM in the next two years and 55 
organizations indicated the use a COM was 
under study. 

Comment : By the end of 1970 there will proba- 
bly be more than 1,000 COM’s in use. 

4. Positive versus negative original recording film: 
56 using positive (normal processing) 

29 using negative (flash reversal processing) 
A few organizations use both positive and nega- 
tive 

Comment: Even though it requires special 
processing equipment to obtain a 
negative image on the original film 
it is being done, there must be a 
need. 

5. The following film processors are being used in 
COM systems: 

Fulton 10 users 

Kodak 18 users 

Remington Unipro 3 users 
Stromberg 6 users 

Other 35 users 

6. The following is the quantity of original re- 
cording film being used per month by 48 respond 
ents who gave figures: 

16mm perforated 80,000 feet 

16mm nonperforated 852,000 feet 

35mm perforated 62,000 feet 

35mm nonperforated 31,000 feet 

105mm nonperforated 6,000 feet 

Comment : The following is an estimate of 
the quantity of recording film all 
COM’s are currently using per 
month. 

16mm perforated 400,000 feet 
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16mm nonperf orated 4,200,000 feet 

35mm perforated 300,000 feet 

35mm nonperforated 150,000 feet 

7. 57 of the 74 users duplicate their film. 

23 Diazo 
29 Kalvar 
22 Silver 

Some used more than one process. 

Comment: Convenience and turnaround time 
are most important. 

8. Duplicating film being used per month by 40 
COM systems reporting: 

16mm — 3,800,000 feet 
35mm — 35,000 feet 
105 x 148mm — 34,000 fiche 
3-1/4" x 7-3/8"— 270,000 fiche 
6" x 8"— 34, 000 fiche 
Aperture cards— 367,000 cards 

Comment : The following is an estimate of the 
quantity of duplicating film being 
used per month by all COM 
systems: 

16mm— 22,800.000 feet 
35mm— -200,000 feet 
Aperture cards — 2,200,000 cards 
Microfiche (various sizes) — 2,000,000 fiche 

9. Microforms being used in COM systems re- 
porting: 

Roll Film (including cartridges) — 56 users 
Microfiche — 21 users 
Jackets — 13 users 
Aperture cards— 13 users 

Comment : The following are the percentages 
of COM systems using each micro- 
form: 

Roll film (including cartridges) — 55 percent 
Microfiche — 21 percent 
Jackets — 12 percent 
Aperture cards — 12 percent 

10. For those using roll film and cartridges the 
following indexing systems are in use: 


Miracode* — 10 users 

Image Blip (Image Count) — 24 users 

Code Line— 7 users 

Flash — 9 users 

Other — 12 users 

Comment : Most COM systems are now using 
the Image Blip (Image Count) 
system of retrieval. 

11. Regarding a question on the use of hardcopy 
the following responses were received: 

Never used — 5 users 
Seldom used— 20 users 
Frequently used — 34 users 
Always used — 6 users 

Comment : Hardcopy is required, but on a 
selected basis. 

12, For 35 respondents, 844,000 pages of hardcopy 
are produced each month. 

Comment: The average COM system pro- 
duces 24,000 pages of hardcopy per 
month. 

Standards 

In February of 1968 the National Microfi'm Asso- 
ciation (NMA) established a committee to investigate 
and recommend standards for microfilm produced by 
COM recorders. This committee has members from 
most of the COM manufacturers, several COM systems 
service companies and many users in government and 
industry. There are three sub-committees each with a 
mfssion as follows: 

Format, Quality and Glossary. 

The National Microfilm Association is attempting to 
coordinate the activities of this new microfilm appli- 
cation by considering standards, reporting of many 
specific applications in its Journal and having COM 
exhibits at its annual convention. 

For additional information on the COM field, write 
to the National Microfilm Association, P.O. Box 386, 
250 Prince George Street, Annapolis, Maryland 21404. 


* Miracode is a trademark of Eastman Kodak. 





The microfilm page printer — Software 
considerations * 


by S. A. BROWN 

Datalogics , Inc. 

Chicago, Illinois 

INTRODUCTION 

Magnetic tape microfilm recorders have been available 
in the market place for the past several years. It has 
been only within the last eight months or so that a 
general awareness of these devices has developed. 
Trade magazines and journals are now carrying feature 
articles describing computer based, microfilm infor- 
mation systems. Investment houses are releasing 
surveys and market evaluations of this area. Talks on 
the economics and human engineering aspects of this 
approach are being presented at many technical con- 
ferences. Little, however, has been said about program- 
ming considerations for the preparation of the specially 
formatted magnetic tape required for the operation of 
these devices. The purpose of this paper, then, is to 
examine the flexibilities and capabilities of magnetic 
tape microfilming as viewed by the programmer, to 
discuss the software problems that he faces when 
attempting to use such a device and to describe several 
generalized solutions to these problems. 

The machine 

A typical such device is the Series F Electron Beam 
Recorder manufactured by 3M Company. It is a micro- 
film page printer with an extended graphic set. The 
page area is a 132x64 character array organized as 
64 lines each containing maximally 132 characters. 
Data to be printed reside as line images on magnetic 
tape. Each line is represented as a character string 


* The work described in this paper was sponsored by 3M Com- 
pany, Computer Graphics, St Paul, Minnesota 


prefaced by one or two characters of coordinate data 
and terminated by delimiter character. The coordinate 
characters specify the position of the line on the page. 
This specification may be in absolute, in terms of a 
specific character in the page array, or relative to the 
last printed line. 

A page printer differs from a conventional impact 
line printer in allowing the line to be placed randomly 
on the page rather than in ascending line sequence. It 
is possible to skip from the bottom of a page to the 
top as easily as from the top to the bottom. This is 
illustrated in Figure 1. 

This may be contrasted to an impact printer which 
can only advance. 

The flexibilities of a page printer can best be appreci- 
ated by considering the problem of printing a report 
containing, say three vertical columns. To print such a 
report on a conventional line printer would require 
buffering an entire page in core or at least the first two 
columns before any data could be printed. In a page 
printer environment, this restriction is removed. 

Extended character set 

Electronic, rather than mechanical generation of the 
character set provides a wide variety of available 
graphics. In addition to the standard upper case set, 
most microfilm printers provide a lower case as well 
as a bold face. Series F Electron Beam Recorder further 
includes a large size set. 

This graphic variety allows design of highly legible 
microfilm documents that previously could be obtained 
only at the expense of typesetting. 
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Figure 1 — Microfilm page organization 


Forms or line art 

This machine provides a means of inserting line 
drawings or ruled forms with printed headings similar 
to custom line printer forms. This capability allows 
insertion of single fixed forms, random retrieval from 
a library of 30 images and sequential retrieval from 
a file of 2000 images. 

Applications 

The application for this device) ranges from that of 
replacing current impact line printers printing on stock 
or custom forms to preparation of material that is 
typeset or types, such as illustrated parts catalogues 
and directories. 

Software implications 

Typically, a microfilm printer is used to replace some 
or all of the functions of a line impact printer. Con- 
ceptually this is easy to visualize, line printing is a 
subset capability of page printing. The user, however, 
finds himself in one of two situations. Either his 
program prints directly on-line or formats a tape for 
off-line printing. The former case obviously implies 
changes to the application program; the latter implies 
either program modification or a tape to tape transcrip- 
tion pass. The low unit page cost exhibited by microfilm 
recorders makes even a nominal tape to tape computer 
charge relatively expensive. It may amount to 20-40 
percent of the total microfilm cost. The apparent 
alternative is program modification. Typical program 
conversion takes from two hours to two days, de- 
pending on the availability of program source, docu- 
mentation and test data. Although, the reprogramming 
time is minimal for a single program, if the universe 
consists of hundreds, as is normal, a major expenditure 
of effort is required. Further, in those instances where 


the user wishes to retain the original program for 
back-up purposes, he is forced to maintain both 
programs. The user is confronted with the potential 
requirement for a large re-programming investment 
and must weigh this against the economics of a micro- 
film system. 

The user is in a similar position when he requires the 
extended character set, page printer or forms capa- 
bilities. This time, however, he is really modifying his 
application and can be expected to expend program- 
ming effort. He has more than a simple media con- 
version problem to solve; he is designing a microfilm 
format that did not previously exist. 

In so modifying his application, he has to consider 
all of the characteristics arid idiosyncrasies of the 
specific microfilm printer he is going to use. These 
include placement of inter-record gaps and control 
codes within the text of the microfilm document. 


Solutions 

In the best of circumstances the user would prefer 
to see extensions to his operating system and program- 
ming languages to support output devices with extended 
graphic and page printing capabilities. If he desires 
merely the same output on microfilm that he obtained 
on hard copy, he should have to change only a pe- 
ripheral assignment statement in his job deck and 
execute his program. In the case where the user re- 
quires full utilization of the microfilm device’s capa- 
bilities, he would prefer to resort to new statements in 
his application languages such as COBOL, PL/I, RPG, 
etc. These might include facilities for declaring multi- 
column output, invoking alternate character sets or 
specifying insertion of graphics. 

3M Company has recognized the need for system 
software with these capabilities and feels that as the 
computer microfilm user community grows, operating 
system and language implementors will include them 
in future systems. During the initial design phases of the 
Series F EBR, they asked us to formulate interim 
solutions for several specific computer systems. We were 
instructed that these solutions remain valid until such 
time that microfilm page printers were recognized by 
operating system implementors as standard peripherals. 
These solutions can be categorized as either conversion 
support or new application support. 

Conversion support 

Support software has been written for the IBM 360 
DOS and O/S operating systems. This has taken the 
form of extensions to the operating systems. Con- 
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siderable care has beentaken to insure that the change 
was local and did not disturb the rest of the system. 

The DOS extension is a supplement to DOS Logical 
IOCS and provides object program compatibility with 
problem programs written in PL/1, COBOL, RPG and 
Assembly Language. I/O Modules similar in concept 
to the ones that comprise Logical IOCS were written 
to interface a printer file definition with a physical 
magnetic tape drive. This interface routine is responsi- 
ble for adding the EBR control codes to the print 
image and forwarding it to the magnetic tape drive. A 
series of these routines reside, together with standard 
IBM supplied I/O routines, in the relocatable library. 
To invoke the extension, the user adds a single link 
editor control card and re-links his program. The out- 
put that normally appeared on a line printer is directed 
to magnetic tape in a format appropriate to the EBR. 

The result of processing this tape on the microfilm 
page printer is identical in all respects to that previously 
obtained on the line printer. 

A similar extension was provided for IBM’s O/S 360 
operating system. In this case the user is provided 
with load module compatibility. Extensions were writ- 
ten for the four QSAM move and locate mode modules. 
The modules have been modified to examine the volume 
serial number of the output data set and if the first 
three characters are “EBR” and the data set is in 
ASA mode, the file contents are re-formatted before 
being written on its assigned device. Operationally, the 
user is required to include only one control card to 
divert his output from a standard system output 
writer (SYSOUT) to a magnetic tape in EBR format. 

A similar system involving modification of the IBM 
1401 Autocoder assembler provides source language 


compatibility for the EBR. Further object program 
support is being developed for the CDC 3300, GE 400 
and RCA Spectra 70 TDOS operating systems. I 
personally feel that object program support is extremely 
important, particularly in this age of proprietary soft- 
ware where source programs may not even be available. 

New application support 

Here, the user requires an output format not obtain- 
able on a conventional line printer. He must develop a 
new program or at least modify a current one. Again, 
he should be insulated from certain details of the 
microfilm printer, such as placement of control codes 
and inter-record gaps. The approach in this case was 
to provide a general purpose output package written 
in COBOL. This package, called EBRPACK, provides 
entry points to select form overlays and character sets, 
plus additional entries to replace the standard COBOL 
printer command “WRITE dataname 1 AFTER 
ADVANCING dataname 2 LINES.” 


SUMMARY 

Software support for microfilm page printers is neces- 
sary and desirable; it must utlimately come from 
operating system and programming language imple- 
mentors. In the interim, operating system extensions 
providing microfilm-line printer interchangeability may 
readily be prepared. Applications requiring specific 
features associated only with microfilm page printers 
may be designed and implemented utilizing output 
packages written in machine independent languages. 



Computer microfilm— A cost cutting 
solution to the EDP output bottleneck 


by JOHN K. KOENEMAN and JOHN R. SCHWANBECK 

Oppenheimer & Company 
New York, New York 


SUMMARY 

Although the computer microfilm recorder has re- 
ceived little attention to date, this new output device 
represents a technological breakthrough which will 
have a major impact on the computer industry. In- 
stallation of a recorder generally results in a tenfold 
increase in the speed of computer output and a con- 
comitant substantial reduction in CPU time which can 
result in major data processing and report production 
cost savings. As an added bonus, a microfilm system 
is the equal of most electronic time-sharing systems 
for information storage and retrieval applications. 
Consequently, we feel that computer microfilm, al- 
though little noticed thus far, represents a major in- 
dustrial and investment concept. 

The electromechanical line printer — heretofore the 
only practical means of obtaining hardcopy rapidly 
from the computer — has a maximum output rate of 
only 2,500 characters per second. But, the computer’s 
throughput capability is 25,000 to 100,000 characters 
per second. Owing to the severe output bottleneck 
that results from this imbalance of speeds, the bulk 
of information ingested and produced by computers 
has, until now, essentially been locked on magnetic 
tape and not easily available to the computer user. 
With the advent of the computer microfilm recorder, 
which can produce output as fast as the computer can 
process data, this mass of stored information has sud- 
denly become readily available in humanly readable 
form. One of the most important questions which must 
therefore be asked is: “How much information is 


stored on magnetic tape and how badly is it desired 
by the computer user?” Our field work has consistently 
shown that an early Xerox type phenomenon exists — 
user volume rises rapidly to meet capacity. 

Because the microfilm recorder eliminates the com- 
puter output bottleneck, it also results in a major cost 
savings. This effect is most readily apparent in the 
data processing service industry, where a customer 
can now realize an approximate 40 percent to 50 per- 
cent reduction in his monthly service bureau bill if 
microfilm rather than continuous paper froms is ac- 
cepted as computer output. 

Even greater relative savings can be realized by 
companies with medium to large-scale in-house data 
processing departments. Overall, it can be shown that 
the lowest data processing costs, at all levels of use, 
are achieved when microfilm recorders are employed 
to produce alphanumeric or graphic computer reports. 

Moreover, acceptance of computer output in film 
form automatically creates an information storage 
and retrieval system which is the equal of most elec- 
tronic systems. Although microfilm has gained a bad 
reputation because of the poorly designed equipment 
and improperly processed film which library users 
have been forced to endure for years, newly introduced 
microfilm equipment can now easily provide the quality 
of image and speed of retrieval of the most expensive 
time-sharing terminals. 

In addition to the standard data processing market, 
there is another separate and distinct market, that of 
pure information storage and retrieval, for which corn- 
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puter microfilm can compete very effectively because 
of its low cost. In fact, computer microfilm is frequently 
referred to as “the poor man’s time-sharing.” The 
service bureau charge for processing and producing one 
page of computer generated microfilm daily for one 
month is 10 percent to 40 percent that of storing one 
page of information on magnetic disc for the same time 
period. When the terminal and communications costs 
of electronic time-sharing systems are also considered, 
the cost advantage weighs even more heavily in favor 
of microfilm. In large measure, this dramatic cost 
difference is the result of the substantially greater 
density of data storage which film (1,000,000 bits/ 
sq. cm.) enjoys over magnetic media (1,000 bits/sq. 
cm.). Thus, although highly optimistic forecasts have 
been made for the growth of electronic systems for 
use in information storage and retrieval applications, 
we feel that fundamental economic considerations 
strongly suggest that computer generated microfilm, 
instead, will become the most common (although, 
obviously, not the sole) method of computer informa- 
tion storage and retrieval. 

Several other benefits are derived from the computer 
microfilm recorder which are normally of peripheral, 
9ut can on occasion be of prime, importance : 

• An unlimited number of report copies can be 
obtained from one computer run with no loss of 
clarity; by contrast, only four or five truly readable 
copies can be obtained from a single run when an 
impact line printer and continuous paper forms are 
used. 

• Owing to its compactness, microfilm essentially 
eliminates the problems and costs of computer 
report storage. 

• Microfilm permits dramatic reductions in com- 
puter report transportation or communications 
costs. 

Computer microfilm is not without certain draw- 
backs, however. A computer microfilm information 
system cannot be used in situations where the data 
base changes rapidly, such as in airline reservations 
or stock market quotations. It also cannot be employed 
where user interaction with the data base is desired. 
Additionally, paper possesses a distinct advantage as 
data processing output where computer usage is very 
light, or scientific applications (i.e., high computation — 
low output) are involved. 

In summary, with the development of the computer 
microfilm recorder, the most efficient processor of in- 


formation — the computer — has finally been directly 
linked with the most efficient means of information 
storage and retrieval — microfilm. User experience to 
date strongly suggests that very large and potentially 
vast demand exists for the inexpensive and fast access 
to computerized information that this combination 
provides. Indeed there is every indication that com- 
puter microfilm could bring about a real information 
explosion. Certainly all ingredients necessary for such 
pyrotechnics are present — a sudden quantum jump in 
the speed of information output, low cost, and ease 
of use (Exhibit 1). As a consequence, we feel that the 
computer microfilm service, hardware, and supplies 
industries will experience impressive growth over the 
near and intermediate term. Indeed, output of micro- 
film recorders, which should jump from 100 units in 
1968 to about 400 units in 1969, presently is production 
limited. 

The microfilm recorder substantially reduces data 

processing and report generation costs for all users 

Although there are considerable variations in volume 
discounts and prime or off shift machine rates, a 50 
percent cost saving is common when a data processing 
service organization customer changes from paper to 
microfilm as computer output. Similarly, cost reduc- 
tions of 40 percent to 70 percent have been documented 
by heavy in-house computer users even though, in 
most cases, the availability of computer reports has 
been substantially increased as well. Although the 
relative cost savings of the in-house user and the service 
bureau customer are similar, the source of these savings 
is not. Whereas essentially all the service bureau cost 
reduction can be attributed to lower computer time 
charges, the bulk of in-house economies derives from 
labor and material savings. On balance, however, it 
can be shown that the lowest data processing costs are 
always obtained when a microfilm recorder is employed. 

Service center cost reductions 

To obtain 1,000 pages (and three carbon copies) 
of processed information, a data processing service 
organization customer presently accepting paper out- 
put will incur about one hour of IBM 360/30 machine 
rental at $65.00 per hour and a materials charge of 
$30.00 for continuous forms. Thus, total service bureau 
charges for the processing and production of 1,000 
pages of information will total about $95.00 when paper 
is used as the computer output medium. 

If, however, a change to computer microfilm is 
made, the cost of a similar run drops to about $40.00 
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Exhibit I — Computer microfilm vs impact printers: 
Distinct advantages 


to $45.00. Because the economics of large, fast com- 
puters can be used to advantage when the machine is 
no longer output bound, most computer microfilm 
programs are run on an IBM 360/65 or equivalent. 
Because the time necessary to process 1,000 pages of 
information on a 360/65 is about 0.2 minutes, total 
data processing charges at $600 per hour amount to 
only about $2 or $3. Conversion from magnetic tape 
to a single microfilm original can be accomplished for 
about $30.00 (three cents per original page), and the 
cost of three copies will add an additional $10.00 
(3.3 mills per page). Thus, for comparable data proc- 
essing and report production services, a computer micro- 
film service bureau will cost only $40.00 to $45.00, in 
contrast to about $95.00 for a traditional data proc- 
essing service organization (Exhibit 2). 

In-house cost reductions 

In the next exhibit (3), it can be seen that although 
the installation of a microfilm recorder (SD4360) in- 
creases the fixed cost of a data processing installation 
about $2,000 per month, variable costs for materials 
are so low that the recorder becomes economically 
advantageous after 90,000 to 100,000 pages per month 
of output, or the equivalent of five to six machine 
hours per day of a relatively small four-tape System 
360/30. Thus, an in-house installation operating two 
shifts can achieve a 25 percerit-30 percnet cost reduc- 
tion through the elimination of machine shift permiums, 
labor, and materials savings. Extensive Army studies 1 
have shown that operating savings of 40 percent to 
70 percent can be achieved when three-shift operation 
or multiple satellite computers with attached line 
printers are involved. 

The magnitude of the demand for computer reports 
that is presently unsatisfied because such reports are 
considered uneconomical can perhaps be judged by 



Exhibit 2 — Computer microfilm vs impact line printer: 
Service center costs 



Exhibit 3 — -Computer microfilm vs impact line printer: 
Leased in-house costs 


noting that if the management of a corporation with as 
little as $15 million in annual sales desired detailed 
daily reports on finished parts inventory, accounts 
receivable, and unfilled orders, almost seven hours 
of computer time would be consumed in printing out 
these reports. 2 Incremental costs of about $3,000 to 
$4,000 per month for materials and possibly $2,000- 
$2,500 for additional labor would probably thereby 
be incurred. Thus, although the utility of detailed 
management reports such as these is probably high, 
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Exhibit 4 — Computer microfilm vs paper continuous 
forms output : Substantial savings in materials, 
machine rental, and labor costs 


we think it likely that the operational difficulties and 
the extremely high EDP costs necessary to produce 
such information have led many manufacturing com- 
panies to forego such data until now. However, with 
the installation of a computer microfilm recorder, the 
same $15 million company described above could pro- 
duce the same reports at an incremental cost of only 
$400 to $500 per month for materials and no incremental 
cost for labor. Thus, the company would then find it 
feasible to produce these reports. Operating experience 
to date of computer microfilm recorder owners certainly 
would point toward such a conclusion. 

Moreover, it is important to note that the cost 
curve of a computer microfilm data processing in- 
stallation is essentially flat out to very large quantities 
of output (Exhibit 3). Thus, the corporate manager 
would now be able to obtain additional detailed re- 
ports almost instantaneously at virtually no incre- 
mental cost. 

Experience to date indicates that most managements 
will quickly begin to utilize the full capacity of a newly 
installed recorder. 

For example, in one case, a large insurance 
company installed a microfilm recorder in May 1967. 
Although the equipment operated only five hours 
per week when first installed, after approximately 
one year, utilization had increased tenfold to 50 
hours per week. In another case, a manufacturing 


concern which began using prototype computer 
microfilm equipment in 1967 had increased its film 
consumption to 20 million feet per year (400 million 
pages) by 1967 and reached 38 million feet (760 mil- 
lion pages) in 1968. 

The substantial savings in consumable materials 
costs, labor costs, and machine rental are, of course, 
the three major cost elements considered in calcu- 
lating operational savings (Exhibit 4). 

Additionally, however, considerable savings in com- 
puter report shipping and storage costs can frequently 
be realized, although these expense elements have not 
been included in our calculations (Exhibit 5). 
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Exhibit 5 — Computer microfilm vs paper continuous 
forms output: Substantial savings in storage and 
shipping costs 


Cost reductions for all users 

In summary, then, by superimposing the costs of 
service centers (Exhibit 2) on those of in-house in- 
stallations (Exhibit 3), it can be seen that the use of 
a computer microfilm recorder will always result in 
the lowest data processing cost at all levels of usage 
(Exhibit 6). 

These facts should be apparent: 

1. A computer microfilm service center is always 
about 50 percent cheaper than a paper service 
center, and this cost advantage probably will 
go higher. 

2. A computer microfilm service center is the 
least expensive data processing alternative up 
to about 200,000 pages of output per month. 
(200,000 pages per month is the maximum out- 
put of a single shift working six days per week 
on a 360/30 with one attached line printer.) 
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Exhibit 6 — Summary of cost comparisons : Computer 
microfilm recorder results in the lowest costs at 
all levels of usage 

3. Beyond 200,000 pages of output per month, 
an in-house computer with a microfilm recorder 
is by far the least expensive data processing al- 
ternative. 

4. An in-house computer/microfilm recorder can 
bring about a cost saving vis-a-vis a com uter/ 
line printer installation beyond about 90,000 
to 100,000 pages per month, or only five to 
six hours of computer time per day, with paper 
output. 

Thus, if decisions regarding an in-house capability 
versus utilization of a service bureau were always ra- 
tional and financially sound, 100 percent conversion 
from paper to computer microfilm output could be 
expected. To anticipate a conversion ratio of 100 per- 
cent is, of course, unrealistic. Nonetheless, the pricing 
revolution which the computer microfilm service com- 
panies have brought about in the data processing 
service industry should result in very extensive use of 
the computer microfilm recorder in this segment of 
the computer industry. The small data processing user 
will be the primary beneficiary of the dramatic reduc- 
tion in data processing service bureau costs. Similarly, 
medium-scale to heavy computer users will find the 
substantial cost and operating advantages of an in- 
house recorder sufficiently compelling to bring about 
heavy conversion to microfilm output in this market 
segment. 


Microfilm is the most efficient medium for storing 

and accessing generated computer data 

Computer microfilm is actually, by a wide margin, 
the most efficient and economical storage and retrieval 
system for computer generated information. Microfilm 
has always been superior to paper from a bulk handling 
and storage standpoint. With the introduction of the 
computer microfilm recorder, it can now also approxi- 
mate electronic time-sharing systems in performance 
for the great majority of information storage and re- 
trieval applications. Thus, computer output on micro- 
film can provide a simple, fast information system far 
superior to those currently in use. Indeed, computer 
microfilm service bureau managements indicate that 
it is not the substantial cost advantage of film over 
paper computer output which is most attractive to 
prospective customers, but rather its usefulness as an 
effective information system. The dramatic cost bene- 
fits, however, can be an extremely effective sales too, 
in getting the customer to consider microfilm seriously. 

Microfilm joins the computer era 

Development of the computer microfilm recorder 
has brought in its wake a flurry of product develop- 
ment activity aimed at greatly facilitating access to 
information on microfilm. Most individuals think of 
microfilm only as an archival medium — for storing 
outdated information for which a need might or might 
not arise at some time in the future. Actually, the ac- 
tive use of microfilm for the storage and retrieval of 
information in daily use has been practiced by some 
pioneering users and companies for years. For the most 
part, these have been extremely large users (e.g., 
Social Security Administration). We feel that in large 
part the reluctance to adopt active microfilm systems 
has been due to the fact that information in such sys- 
tems had to be manually sorted, updated, and coded — 
a tedious and time-consuming task. 

Now, however, this task has been eliminated through 
the development of computer microfilm coding systems 
which can provide manual access to one page out of 
73,500 in one to five seconds. 

Additionally, the speed and ease with which com- 
puter information can be obtained on microfilm has 
been increased from days to literally minutes. One 
manufacturer has adopted a marketing program stress- 
ing “on time” information rather than “real time”, 
which is, in fact, an accurate description. There is 
virtually no computerized information which cannot 
be obtained overnight in a fully useful, properly indexed 
format. 
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In sum, the user of computer microfilm has access 
to a “poor man’s time-sharing” information system, 
as some have termed it, with no addition to his CPU 
costs. 

Computer microfilm competes effectively 
against time-sharing 

Many feel that time-sharing will become the most 
common method of providing access to computer 
generated information. But, it can be shown that for 
most applications, the storage and retrieval of infor- 
mation electronically is very uneconomical relative to 
a computer microfilm system. 

For example, in one specific application, a data 
storage capacity of 15,000 pages, to be updated 
daily, was required. The effective cost of this appli- 
cation on a commercial time-sharing disc file system 
equalled about $3.00 per page or a total of $45,000 
per month. On microfilm, this same information 
can be updated once a day for approximately $0.60 
per page or $9,000 per month— a storage cost re- 
duction of almost 80 percent. Moreover, the time- 
sharing system would incur additional costs for 
terminal connect time and computer search time. 

Therefore, we feel that microfilm, as a medium of 
access to computer information, will become much 
more commonly employed than time-sharing in the 
future. Time-sharing, however, will always be required 
for applications in which immediate interaction with 
the data base is desired. 

Microfilm permits the storage cost savings just 
described because it has a significantly greater storage 
density capacity than the magnetic storage media used 
in time-sharing systems (i.e., disc packs and data cells). 
While it is only possible to store approximately 1,050 
bits per square centimeter on computer magnetic ma- 
terials, it is possible to store 1,000 times this amount; 
or over one million bits, on a square centimeter of micro- 
film. 

In addition to storage costs, the relative disadvan- 
tages of time-sharing for information storage and re- 
trieval include substantially higher terminal and com- 
munications costs (Exhibit 7). 

As shown in the exhibit, a full page of information 
can be accessed in one to four seconds on the CARD 
device. To equal this speed with a time-sharing system 
a high-cost video terminal and Telpak-D communica- 
tions line must also be employed. 

As a result of these cost factors, microfilm is the more 
economical of the two systems for most normally en- 
countered information storage and distribution prob- 


MICROFILM VIEWERS vs. TIME SHARING SYSTEMS: 

FAVORABLE COMPARISON IN TERMS OF COST AND SPEED 
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Exhibit 7 — Microfilm viewers vs time sharing systems: 

Favorable comparison in terms of cost and speed 

lems. The surface illustrated in Exhibit 8 delineates 
the points (determined by file size, number of users, 
and update frequency) at which a microfilm system 
is roughly cost equivalent to an electronic information 
storage and retrieval system. 

For the problems located within the surface, a micro- 
film system is less expensive; for those outside the 
surface, electronic systems are less expensive. 

For example, the exhibit demonstrates that when 
information must be available to 200 users and up- 
dated every business day, a microfilm system is more 
economical for files of 14,000 pages or less. A file of 
this size could contain the daily closing stock quota- 
tions for the NYSE, ASE, and OTC market for over 
four years. Similarly, a 14,000 page file could contain 
all the records for payroll, personnel, and finished 
goods inventory (plus 10,000 accounts receivable 
records) for an average industrial corporation with 
sales of $800 million per year. 3 

There are two types of commonly encountered ap- 
plications for which microfilm is not a suitable re- 
placement for time-sharing: when the user wishes to 
input, manipulate, and extract data at will, and when 
updating is required more than once a day, such as 
in transportation reservation systems (these cases are 
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Exhibit 8 — Computer generated microfilm vs electronic 
computer systems: Most economical information 
storage and retrieval system for most commonly 
encountered applications 


located above the update frequency = 20 times/month 
plane in Exhibit 8). Whereas time-sharing allows in- 
formation stored in a computer to be updated im- 
mediately and made readily available in updated form 
to all users, with a microfilm system four to six hours 
is the minimum time one may expect for file update, 
preparation, and distribution. 

However, in most other commonly encountered in- 
formation storage and retrieval applications, computer 
processed data is required for informational purposes 
only, such as in referencing records to service a cus- 
tomer inquiry. In these cases, a microfilm information 
system is equally as effective and far less expensive than 
a time-sharing system. 

Recently, hybrid information systems have been 
introduced in which a data base is stored on microfilm 
while recent updates and changes can be retrieved 
electronically from computer memory. These systems, 
which utilize the advantages of both microfilm and 
time-sharing systems, should find widespread accept- 
ance in the future. 
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Design of distributed communications 
system— A case study 


by N. NISENOFF 

Computer Command and Control Company 
Washington, D. C. 


INTRODUCTION 

The development of a concept for a Department of 
the Army Civilian Personnel Management and Man- 
power Data Reporting System and an Optimum Auto- 
matic Data Processing System was undertaken by 
Computer Command and Control Company in June, 
1967.* 

The work was initiated by the Department of the 
Army to meet the increasing demand for more detailed 
information about civilian employees, as required in 
connection with Army-wide civilian personnel career 
management programs, and in view of new and more 
detailed general governmental reporting requirements. 
In addition, the system was to be capable of main- 
taining data concerning the wide range of skills and 
experience of Army personnel. A further goal was the 
reduction to a minimum of the time delay in com- 
municating relevant personnel data for the purpose 
of applicant screening. 

The system, as developed, is a generalized civilian 
personnel information system that embraces all as- 
pects of the Army’s civilian personnel management 
activity and control. It provides the information 
gathering, processing, storing, querying and reporting 
capabilities to meet the requirements of Headquarters, 
Department of the Army; all echelons of field com- 
mands; the Department of Defense; the Bureau of 
the Budget; the Civil Service Commission; and other 
government agencies. 


* This effort has been performed for the Deputy Chief of Staff 
for Personnel, United States Army, under contract DAHC15 
67 C 0265. 


The design concept: 

1. Provides a powerful, efficient, open-ended, 
processing capability at a cost level that is 
the minimum commensurate with the system 
requirements. 

2. Utilizes the most advanced (yet proven) hard- 
ware and information entry, storage and retrieval 
techniques available as so to effect data entry, 
validation, distribution, storage and organized 
retrieval with minimal human intervention. 

3. Offers direct, rapid, complete and easy exchange 
of both formatted and unformatted personnel 
information among authorized individuals and 
offices at all levels. 

4. Provides standardized funtional personnel man- 
agement information formats and processing 
techniques, together with adequate on-line 
analytical tools. 

5. Makes exchange of data with the Civil. Service 
Commission, the Department of Defense and 
with other Army systems simple and easy, pro- 
viding data definitions have been standardized. 

Insofar as practicable, use has been made of present 
data bases. By applying automatic file conversion 
techniques previously developed, it will be possible 
to efficiently convert many existing data bases into 
random access files that can be electronically updated 
and queried. Particular attention has also been given 
to the problems of interfacing with and making the 
best use of existing automated or partly automated 
general management information systems within the 
Department of the Army. 
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The scope of the project is in part indicated by the 
following: There are over half a million civilian em- 
ployees of the Army paid from appropriated funds, 
of whom about 140,000 are foreign nationals. In addi- 
tion, there are about 200,000 civilian employees over- 
seas who are paid from non-appropriated funds but 
who are administered by Civilian Personnel Offices. 
To service these employees, there are some 200 Civilian 
Personnel Offices scattered around the world. For just 
the United States Civil Service employees, it is es- 
timated that approximately two billion characters of 
information will need to be carried in the Army Ci- 
vilian Personnel Management and Manpower Data 
Reporting System. 

Following the initial data gathering and analysis 
phase, the record and file structuring effort was under- 
taken. During this phase all candidate data elements 
were identified, classified and organized into files. 
File usages were then examined and files were assigned 
to appropriate storage media. For example, the data 
elements required to develop a reply to a relatively 
frequent query were placed in a fast mass random ac- 
cess storage subsystem. On the other hand, any data 
element required but infrequently was placed in the 
magnetic tape storage subsystem. 

Given the results of the file structuring study, four 
separate and distinct Continental United States 
hardware configurations were postulated, and two 
additional prepared for overseas components. Each 
configuration is capable of performing the data proc- 
essing functions required. The Continental United 
States configurations specified were : 

1 . A centralized single computer system ; 

2 . A regionalized five computer system ; 

3. A decentralized twelve computer system; and 

4 . A localized twenty-one computer system . 

A cost analysis was then performed to evaluate each 
configuration. The overall results of this evaluation, 
including the cost of initial loading of the data bank, 
is shown in Figure 1 . 

After considering this cost data plus the other advan- 
tages and disadvantages, Configuration I was selected. 
It of is interest to note that the total cost of the system 
per Civilian Personnel Office for Configuration I is 
about the same as the salary and overhead cost of 
one GS-5.** 

To establish the practicality of the implementation 
of the proposed system, a break-even analysis was 
performed. A reduction in work force of four percent 


** At the time the report was prepared, a GS-5 earned $5,732.00 
per year. 



NUMBER OF COMPUTER SITES 
I II III IV 

CONFIGURATION NUMBER 


Figure 1 — Monthly cost per CPO for four examined 
configurations 

within the Civilian Personnel Offices is the break- 
even point, while a six percent reduction would pro- 
duce net savings of approximately 1.25 million per 
year. There is evidence that in an automated system, 
this reduction in work force could be made with no 
loss of efficiency or productivity. In fact, the automated 
system could be expected to greatly increase staff 
efficiency and productivity, as well as provide manage- 
ment information vastly improved with respect to 
timeliness, completeness, accuracy and internal con- 
sistency. Finally, analytical services would be available 
which cannot be achieved with a manual system. 

The software and programming aspects of the overall 
problem were not examined as thoroughly as desired. 
Certain assumptions were made, among these were: 

1. The computer hardware would be dedicated 
to the application. 

2. Computer manufacturer's software support 
would be adequate for all needs except specific 
applications packages. 

3. The query language and the storage? and re- 
trieval subsystems were not specified. 
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Fortunately, the study was not performed without 
prior experience or knowledge concerning these points. 
Previous efforts by the Company, as well as members 
of the team, had been concerned with these very 
points. Estimates were made and employed. 

A subsequent investigation*** required a more 
detailed and thorough examination of these very 
points. The results of that study will be reported upon 
in the near future. 

The dimensions of the problem 

General description 

Within the Department of the Army, military and 
civilian personnel administration is centered in single 


*** Contract No. FA68WA-1913, Design of FAA Manpower 
and Personnel Information System. 


offices, at all levels, wherever both military and civilian 
personnel are found in significant numbers. However, 
at the execution level, branching is noted between the 
military and civilian personnel staffs in executing 
day-to-day detailed, direct operational responsibilities. 
This will continue to be the case in the future. 

Table I indicates the distribution of civilian em- 
ployees with respect to citizenship; by Army area in 
the Continental United States, or geographical area 
outside Continental United States, and membership 
of the staff of Army Material Command, Corps of 
Engineers or “other” organizations. Additionally, it 
presents the number of Civilian Personnel Offices 
servicing the ten designated groupings. 

Information and processing requirements 

As a basic premise, it is assumed that processed num- 


TABLE I — Distribution of civilian employees and civilian personnel offices 


Army or 

Geog. Area 

No. of Orgs. 

No. of CPO’s 

Total U.S. 

Employees 

Foreign 

Nationals 

Grand Total 

I 

87 

63 

147,800 


147,800 

III 

34 

22 

57,200 


57,200 

IV 

25 

20 

45,200 


45,200 

V 

37 

30 

63,600 


63,600 

VI 

29 

25 

42,600 


42,600 

Hawaii 

7 

1 

5,900 


5,900 

Alaska 

6 

3 

3,100 


3,100 

Far East 

4 

5 

5,100 

81,000 

86,100 

Europe 

19 

16 

7,600 

56,200 

63,800 

SOCOM 

2 

2 

1,900 

2,800 

4,700 

Totals 

250 

187 

380,000 

140,000 

520,000 


Breakdown of U.S. Employees 

Army or 


Geog. Area 

AMC 

Corps of Engineers 

Other 

Total U.8. Employees 

I 

72,400 

14,300 

61,100 

147,800 

III 

19,700 

10,700 

26,800 

57,200 

IV 

16,400 

7,000 

21,800 

45,200 

V 

31,700 

8,500 

23,400 

63,600 

VI 

19,800 

6,700 

16,100 

42,600 

Hawaii 



5,900 

5,900-' 

Alaska 


500 

2,600 

3,100 

Far East 



5,100 

5,100 

Europe 


300 

7,300 

7,600 

SOCOM 



1,900 

1,900 

Totals 

160,000 

48,000 

172,000 

380,000 
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eric, textual and graphic information delivered to the user 
must be adequate to meet both predicatable and ad hoc 
needs Processing and delivery must be timely. 

The basic parts of such an automated system are: 

1 . A central processor (or processors) . 

2. Data storage capacity and retrieval capability. 

3. A means for inputting and outputting informa- 
tion. 

4. Adequate data communications. 

Central processor 

In discussing central processors, there are two basic 
factors to consider. If there is to be but one central 
processor, then the only requirement of importance is 
that it have the capacity and time available so as 
to be able to handle the input and output loads and 
perform the required processing. 

On the other hand, if a multiple computer configura- 
tion is decided on, in addition to the requirement set 
forth above, there has to be a distinct set of software 
programs for each local or decentralized computer 
type which is not internally compatible with the master 
computer, plus additional programs to provide for 
transfer of data from one computer to another. This 
adds considerably to both the cost and the complexity 
of the system. 

If the central processor is not dedicated to civilian 
personnel use, but is shared, the particular priority 
that would most probably be accorded civilian person- 
nel information processing would entail delays of 
indeterminate length. As the number of computers 
handling civilian personnel information is increased, 
the likelihood of sharing the computer increases 
greatly. At the same time, any procedure that involves 
the output of more than one computer would not be 
completed until time is available on the last available 
computer. It is not only the delay that can prove to 
be vexing; it is also the fact that it is most difficult to 
ascertain how long the delay might be. 

Data storage capacity and! retrieval capability 

When, as is expected, 40 percent of all employees 
are in the career management program, storage for 
approximately 1.8 billion characters of information 
will be required for the records of the United States 
Army civilian personnel. Storage for an additional 
600 million characters will be necessary for United 
States employees overseas and foreign national em- 
ployees. 


Data elements 

In determining how large an individual record 
would be, it is recognized that the record of a new em- 
ployee will not be as extensive as that of an employee 
who has worked many years. To measure that dif- 
ference, the number of characters for a typical personnel 
record of a GS-1 through GS-5, of a GS-6 through 
GS-11 and a GS-12 through GS-18 were recorded. 

Table II is a summary of information concerning 
the data elements which are required to meet both 
present and anticipated needs. It also provides the 
numbers of characters required for each of three record 
categories. 

Input and output requirements 

There are 160 Civilian Personnel Offices in the Con - 
tinental United States, which will require a total of 
from 300 to 400 input/output consoles, depending on 
the make or type finally chosen. The overseas Civilian 
Personnel Offices will require an additional 60-90 
consoles. 

Both from qualitative and cost standpoints, the ef- 
fects of proper or improper console selections will 
clearly be very significant in view of the large number 
of units involved. 

Four broad categories of information are present 
within the system. 

1. Information necessary to update the files and 
records. 

2. Queries and responses. 

3. Statistical data. 

4. Processing outputs in general. 

Detailed analysis of data presently handled or 
required in updating files and records and in meeting 
all except ad hoc query demands results in 20 characters/ 
man/day of data input and 18 char acters/man/ day 
output. To handle ad hoc queries and their response, 
an additional 12 characters/man/day (six input, 
six output) are considered adequate. These statistics 
are based on there being 240 working days a y ear. 
Table III shows the known inputs to and outputs 
from the central processor and the numbers of charac- 
ters/man/year and characters man/day. 


Alternative configurations 

As a starting point, a single computer system was 
considered. Next, an integrated system of six computers 
was examined, then one with 13, and finally , one with 
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TABLE II — Summary information: Data elements 


Average Sized Record 



Maximum 









Record 

GS 1-5 

GS 6 

-11 

GS 12-18 


Size 



Plus 



Data Elements 

(Char) 

Plus W. B. 

W. B. Supervisors 





Single 

Multiple 

Single 

Multiple 

Single 

Multiple 



Computer Computer Computer Computer Computer Computer 

Organization Elements* (0) 
Position Elements (P) 

164 

164 

164 

164 

164 

164 

164 

Civilian Personnel 

Elements (CP) 

11,132 

2,796 

2,796 

4,335 

4,335 

6,192 

6,192 

Career Management 
Program Elements (CM) 

7,548 



2,562# 

7,548# 

2,456# 

7,548# 

Executive Assignment** 
Elements (EA) 

Primary Personnel (PP) 

2,467 







Elements 

Statistics (ST)* 

139 


139 


139 


139 

TOTAL 


2,960 

3,099 

7,061 

12,186 

8,812 

14,043 


Using the statistical data tabulated below, the average record length would be 4,700 for a single computer system 
and 7,100 for a multicomputer installation. 

GS 1-5 plus W. B. less W. B. Supervisors — 230,000 employees = 60% 

GS 6-11 plus W. B. Supervisors — 113,000 employees = 30% 

GS 12-18 and PL Appointees — 37,000 employees = 10% 

(U.S. citizens only) 

* The storage requirements for organizational and statistical elements are insignificant as compared with the 
personnel data storage requirements. Therefore, no values are assigned. 

** Executive assignment data not carried in Army System but by Civil Service Commission; presented here for 
information only. 

# The character count for average record length in the CM category is kept at maximum record length since it is 
expected that personnel in this program will have been in the civil service for several years and therefore will 
have need for all the record space available. 


22. The process halted at this point because it was 
becoming apparent that the complexities of a multi- 
computer system — and, as a result, the unpredictable 
time responses as well as costs — were growing at a 
far higher rate than any advantages that might accrue. 

The first configuration involves a single computer 
located in the Washington area. Figure 2 shows a 
schematic of this. Each Civilian Personnel Office 
would input to and receive data from this one com- 
puter. A complete record concerning each civilian 
working for the Department of Army would be main- 


tained in this computer, except for foreign nationals 
in Europe and the Far East. This would result in a 
total of approximately 380,000 records. 

At each Civilian Personnel Office, there will be at 
least one input/output console. Here, all information 
would be entered that would update personnel records, 
as well as Civilian Personnel Office queries. Each 
console would also ^ecieve all query-response outputs 
as well as general information outputs and record 
printouts that are needed for all personnel management 
purposes. 
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TABLE III — Major inputs to and outputs from system (Annual) 


INPUTS No. of Characters 

Payroll Change Slip (2515) 150 

Request for Personnel Action (52) 300 

Application for Federal Employment (57) 750 

Request for Referral List (2302-2) 350 

Installation Training (750) 1,000 

Employee Perfonnance & Career Appraisal (2302-4) 1 , 600 

Employee Performance Rating (1052) 100 

Job Description Rewrite (374) 400 

Qualification Record (2302) 100 

Total Inputs — 

OUTPUTS 4,750 

Referral List Response (2302-2) 165 

Notification of Personnel Action (50) 300 

Career Employment Record (2302-5) 3 ,600 

Position Review (275) 100 

Occupational Inventory of Civilian Positions (1629) 100 

Table of Distribution & Allowances (2952) 70 

Civilian Personnel by Basic Rate (3100) 70 

Civilian Personnel Employment Report (3250) 22 


Total Outputs 4 ,427 


Known Inputs 4,750 = 20 Characters/man/day 

Known Outputs 4,427 = 18 Characters/man/day 


Known Requirements 9,177 = 38 Characters/man/day 

For ad hoc queries and presently unanticipated requirements, 6 characters/day each for input and output are 
added^ 

For system concept development the following values are used: 

Total Inputs 26 Char/man/day 

Total Outputs 24 Char/man/day 

Total Requirements 50 Char/man/day 


The system will be able to accept inputs of authorized 
manpower spaces and changes to them as these allo- 
cations or changes to specific spaces are made. By 
comparing the data with position information re- 
ported by Civilian Personnel Offices, it will be easy to 
determine, at any time, discrepancies between vacan- 
cies and established positions. 

The second configuration provides for five computers 
carrying personnel records in addition to the RAPID 
system computer complex in Washington. Figure 3 
is a schematic diagram of this. 

These five processors would be located, with numbers 
of Army and Civilian Personnel Offices serviced, as 
shown in Table IV. 

Personnel records of employees assigned to each 
Civilian Personnel Office would be stored in the com- 


puter for the respective Army area. Each Civilian 
Personnel Office would use its consoles to communicate 
with the computer in the Army area where it was 
located, in the same manner as described previously 
for the operation of a single computer system. 

These five area computers would be connected 
electrically to the RAPID computer in Washington. 
The central (Washington) computer would store some 
20 critical elements of information (approximately 
100 characters) concerning each non-career employee 
plus additional career management information for 
each employee in the career management system. 

This master computer would thus be able to produce 
statistical data as well as provide responses to many 
queries without requiring access to the area computers. 
Answers not obtainable from the RAPID system 
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160 CPOs IN U,S, 


COMMUNICATIONS 
CONTROL UNIT 



Figure 2 — Single computer configuration data input 
and output 


computer complex would be generated by polling the 
computer(s) able to furnish the answer, or if the specific 
computer containing the desired information was not 
known, all five computers would be queried. 

The third configuration provides for twelve com- 
puters carrying (primarily) personnel records, plus 
the RAPID system computer in Washington. Figure 
4 shows a schematic diagram of this. 



O COMPUTER SITE IN EACH ARMY, EACH 
INSTALLATION INCLUDES CENTRAL PRO- 
CESSOR* MASS STORAGE AND PERIPHERALS, 

Figure 3 — Six computer configuration 

In this approach, a configuration was developed 
wherein the two major employers of civilian personnel, 
the Army Materiel Command and the Corps of En- 
gineers, retained the records of their own personnel 
in their own computers. The balance of the employees 
are serviced in Army area computers as in Configura- 
tion II. 

In an effort to store the data as close to the Civilian 
Personnel Offices as feasible, a twenty-two computer 
configuration was also postulated and studied. A 
schematic of this is shown in Figure 5. Again, the 
RAPID system computer would serve the same 




Command 
DCSPER 
First Army* 
Third Army 
Fourth Army 
Fifth Army 
Sixth Army** 


TABLE IV — Six computer configuration description 


Location 

Pentagon 

Ft. Meade, Maryland 
Ft. McPherson, Georgia 
Ft. Sam Houston, Texas 
Ft. Sheridan, Illinois 
The Presidio, San 
Francisco, California 


Personnel Serviced 


163,000 

57.000 

45.000 

63.000 

52.000 


380,000 


Number of CPO’s 

86 

22 

20 

30 

29 

187 


* To include records for U. S. personnel in Europe, the Far East and Southern Command. 
** To include records for U. S. employees in Hawaii and Alaska. 
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O COMPUTER SITE IN EACH ARMY AND AT INDICATED 

COMMANDS. EACH INSTALLATION INCLUDES CENTRAL 
PROCESSOR t MASS STORAGE AND PERIPHERALS. 



COMPUTER AT EACH GEOGRAPHIC LOCATION INDICATED IN 
> EXHIBIT 5-2. EACH INSTALLATION INCLUDES CENTRAL 
MASS STORAGE AND PERIPHERALS. 


Figure 4 — 13 computer configuration 


Figure 5—22 computer configuration 


function as described in the explanation of Configura- 
tion II. 

Hardware technology 

The general considerations which must be taken 
into account in designing a complex system such as the 
one under discussion can be divided into two major 
areas. One is concerned with the hardware to be em- 
ployed and the other with the software. The hardware 
area, in turn, can be subdivided into four parts, while 
there are two distinct aspects of software use to be 
considered. 

The following discussion will be concerned only with 
the hardware aspects, specifically : 

1. The central processor hardware considered 
during the study, 

2. The mass random access storage systems, 

3. The input/output terminals to be located at 
the various Civilian Personnel Offices and 
Headquarters offices throughout the Continen- 
tal United States and, 

4. The communications techniques or channels to 
be utilized. 


Of the four specific hardware subsystems, the com- 
munication channels, the terminals and the computers 
are highly interrelated. Further, the communication 
software package either provided by the hardware 
manufacturer or developed by the contractor must 
be integrated in such a fashion as to permit the large 
volumes of data transfer tg work in a smooth, well- 
integrated fashion. This was assumed to be true for 
this study. 

Automatic data processing systems 

Following a detailed examination of the automatic 
data processing systems available at the time of the 
study (1967), a representative selection was made. 
Summary data concerning these systems will be found 
in Table V. 

Random access mass storage systems 

Of the four configurations specified, Configuration I 
requires the largest and fastest mass random access 
storage subsystem. Under this configuration, an on- 
line storage capacity (at one computer site) of an esti- 
mated 1.8 billion bytes of information will be required. 
Today, no single device is available which can meet 
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this requirement. To attain this storage volume a 
number of units must be integrated into the total 
system. For example, use of the RCA RACE unit, 
actually the cheapest available device on a dollar per 
byte basis, would still require three units to accom- 
modate the total volume of data. To purchase these 
units (including their individual control units), will 
cost nearly half a million dollars, while rental would 
be about $12,000 a month. 

Table VI presents a summary description of the 
most likely candidates for the mass random access 
storage systems. Note that this listing of devices in- 
cludes the largest mass storage units that are presently 
available as well as smaller units which have been 
considered for utilization with the smaller decentralized 
centers described in Configuration IV. Figure 6 shows 
the storage capacity as the independent variable with 
the cost/performance ratio shown in cents per bytes 
stored. 


Communications channels 

Initially, the U. S. Postal System was considered 
as a valid technique of transmitting the daily accumu- 
lated data (from each Civilian Personnel Office) to 
the computer site. Further examination raised tw T o 
objections. These were: 

1. Cost involved. 

2. Transmission delays. 

The use of the Postal Service entails several costs 
which can be summarized, on a monthly basis, as: 

Postage $37,884 

Packaging 233 

Addressing and Handling 375 
Replacing Damaged and Lost 

Reels 375 

$38,867 or $240/ 
month/CPO 
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TABLE VI 


MANUFACTURE^ 

NAME AND MODEL TYP?’ 

CAPACITY 

STORAGE 

ACCESS 

ACCESS TIME: MILLISECONDS 

PRICE* | 

NUMBER OF 
BYTES 

UNITS PER i 
CONTROL 

MEDIA 

TRACKS 

PER 

SURFACE 

NUMBER OF 
HEADS PER 
MECHANISM 

— 

MINIMUM 

AVERAGE 

MAXIMUM 

PURCHASE 

RENTAL 

COST 

PER 

BYTE 

(*/B) 


RCA 568-11 (RACE) 

560M 

8 

STRIP 

128 

n 

16 

20 

200 

385 

$14SK 


.026 

IBM 2321 (DATA CELL) 

400M 

8 

STRIP 

100 

20 

20 

25 

•*300 

■*600 

ma 

$2,800 

.034 

DATA PRODUCTS' 

400M 

1 ; 

DISK 

510 

1 

8 

50 

85 

250 

143K 


.036 

BRYANT 2AC 4000 

419M 

? 

DISK 

728 

2 

6 

30 

110 

180 

375K 


m 

BURROUGHS 9375 

500M 

1 

DISK 

150 

150 

1 

0 

60 

120 

S90K 

9,900 

.118 J 

IBM 2314 

Z10M 

1 

DISK 

200 

1 

18 

25 

75 

135 

250K 


B 

CONTROL DATA 814 

IS1M 

■ 

DISK 

192 

1 

4 

20 

SO 

110 

230K 

s.soo 


UNIVAC FASTRAND 2 

100M 

8 

m 

HI 

1 

64 

39 

92 

154 

165K 

m 

.165 | 

NCR 353-3 (CRAM) 

18M 

16 

STRIP 

56 

56 

1 

24 

235 

235 

35, 5K 


Hj 

SDS 7202 

737K 

8 

DISK 

m 

ma 

1 

0 

17 

35 

18K 


S3 

IBM 2311 

1 

1 

DISK 

200 

20 

10 

25 

75 

135 

26. 3K 
















ARRANGED IN ORDE) 

OF COST PER 

BYTE OF STORAg' 




































B 











a 



1M 10M 100M IB 10B 


CAPACITY (BYTES)! 

STORAGE UNIT LEGEND 

1. RCA SOS - 1 1 (RACE) 5. BURROUGHS 9375 9 . NCR 353-2 (CRAM) 

2. IBM 2321 (DATA CELL) 6. IBM 2314 10. SDS 7202 

3. DATA PRODUCTS 5085 7. CDC 814 11. IBM 2311 (DISC PACK) 

4. BRYANT 2AC 4000 8. UNIVAC FASTRAND II 

Figure 6 — Cost (cents per byte) /capacity (bytes) 


For electrical communications, two distinct and 
different classification schemes or methods can be 
employed to facilitate the analysis. These are: 

1. Governmental/non-governmental facilities. 

2. TWX voice grade/broad band facilities. 

Comparisons among costs for each of the services 
noted become very involved and complicated for a 
single configuration, let alone for four. However, each 
service is described below, following a brief discussion 
of the data volumes expected. With approximately 
380,000 United States citizens covered by the system, 
and with a flow of forty to fifty characters per man per 
day over the communications channels; it seems almost 
mandatory that a “dedicated” communications! system 
be available. 

In the event personnel records are procesesd by a 
computer used for other applications as well, it is 
assumed that the personnel system will be available 
for major update processing and for query response 
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during the third shift (eight hours). With an average 
of fifty characters/man/day traffic for 380,000 records, 
there will be a traffic flow of nearly twenty million 
bytes (eight-bit characters) a day. With a 70 percent 
line utilization, this requires a 1,000 bytes/second 
transmission capability. 

Autodin 

Autodin can be used to provide the type of service 
required. However, the ^following points serve to 
eliminate it from consideration: 

1. The service is not available at approximately 
20 percent of the Civilian Personnel Office sites. 

2. It is an extremely costly means of transmitting 
civilian personnel data. The charge is a fixed 
rate per site and is high, in part at least, because 
the system must be able to pass classified in- 
formation. Since civilian personnel data would 
not be classified, except possibly for occasional 
specific information that would be afforded 
special handling, this costly apsect would not be 
needed. 

3. Civilian personnel information would be af- 
forded a low priority as compared to other data 
using Autodin. This would cause delays of 
variable and indeterminate length. 

The cost, per site or terminal, depends upon the 
line bandwidth required, not the distance the message 
is sent or the line usage (time). These costs are shown 
in Table VII. 

Autovon 

Autovon is a military leased, voice grade, direct 
dial telephone system. There is no apparent reason 
for not employing this system to transmit digital 
data during off-hours (6 p.m. to 6 a.m.). 

Charges are variable, but an estimate of $315- 
$372/month/CPO seems reasonable. 

Hard decisions concerning the use of Autovon for 
digital data transmission were not Obtained, though 
statements were made that Autovon is used in some 


TABLE VII — Costs 

Autodin Bandwidth 
(Band) 

75 

150 

1,200 

2,400 


Cost/ Mon th 
Terminal Sit 

{1,188 

2,375 

9,504 

14,250 


cases for data transmission. Neither could any indica- 
tion be obtained that for night use lower or preferential 
rates were available., 

Wide area telephone service (WATS) 

The most attractive data transmission channel 
studied during this effort, from the viewpoint of the 
Army Civilian Personnel Program, is the Wide Area 
Telephone Service (WATS). WATS offers two billing 
plans; a 24-hour, unlimited service, and a measured 
time service. 

Under a measured time WATS contract, a basic 
monthly charge for the first ten hours of usage is 
made and an additional charge per hour of actual 
usage is levied. The tariff which governs this service 
is extremely detailed and a full discussion is beyond 
the scope of this report. However, a single computa- 
tion will indicate the method of selecting between the 
unlimited and the measured WATS. 

A single WATS line with a six band capability (full 
48 state coverage), based in Washington, D.C. costs 
$2,250 per month. The measured WATS, with the 
same capability, costs: 

C = 370 + 29(H) 
where, 

C = Cost in dollars per month 

H = Hours of usage beyond the first ten hours, 
per month. 

The break-even point can be calculated by setting 
C = 2,250 and solving for H. This yields a value of 
H = 65 hours or 75 hours/month of circuit time or 
approximately three hours/day. 

Definitive evaluation for the Army Civilian Per- 
sonnel System must await final implementation de- 
cisions. However, a computation concerning the use 
of WATS for several possible configurations has been 
carried out, and is detailed in Table VIII. 

Leased broad band lines 

Finally, use of leased broad band lines was examined. 
Their use was considered only for Configuration I 
implementation. 

Many possible line linkages can be conceived. The 
one demonstrated here is for illustrative purposes only, 
but is typical. 

Postulate that a concentration device, or subsystem 
(such as a very small digital computer with magnetic 
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TABLE Till — Number of WATS lines required vs configurations 




Number 


Cost/ 

Configuration 

Location 

of Bands 

Bands 

Month * 

I 

Washington, D. C. 

4 

1,3, 





6, 6 

$ 6,100 





$ 6 

II 

Ft. Meade, Maryland 

2 

1,3 

$ 2,075 


Ft. McPherson, Georgia 

1 

1 

1,300 


Ft. Sam Houston, Texas 

1 

1 

2,015 


Ft. Sheridan, Illinois 

1 

5 

2,075 


Presidio, California 

1 

2 

2,000 


$ 9,465 


PADIR System Input 

1 

6 

$ 2,250 




$11 

III Ft. Meade, Maryland 

1 

3 

$ 1,575 

Ft. McPherson, Georgia 

1 

1 

1,300 

Ft. Sam Houston, Texas 

1 

1 

2,015 

Ft. Sheridan, Illinois 

1 

5 

2,075 

Presidio, California 

1 

2 

2,000 

Corps of Engineers, D.C. 

1 

6 

2,250 

AMC, D. C. 

1 

6 

2,250 

MIJCOM, Edgewood, Maryland 

1 

0** 

2,250 

T & E Com., Aberdeen, Maryland 

1 

6** 

— 

WeapCom., Rock Island, Illinois 

1 

5 

1,500 

ECom., Ft. Monmouth, New Jersey 

- 

- 

— 

Missile Com., Huntsville, Alabama 

- 

— 

_ 




$17,215 

RAPID System Input 

1 

6 

2,250 


$19,465 


* Includes intrastate charges, as required. 
** Shared. 


tape or disk), will be located at each Army Headquar- 
ters. A WATS system, similar to the system described 
for Configuration II (see Table Till) will be installed. 
Leased broad band lines would aslo be installed from 
the five Army Headquarters to the RAPID site. 

A broad band channel capable of transmitting 5,100 
characters/second costs $15/mile/month. In addition 
each terminal requires a termination which rents for 
$250 each. As a result, monthly rental for the com- 
munication channels (and their terminations), but 
not the concentration, would be $18,465. 


Terminals 

Terminals must have certain attributes. The re- 
quirements will vary, depending on whether the 
terminal is located (for example, at a major command 
headquarters, or in an operating Civilian Personnel 
Office) on the final network configuration selected, 
and on the communication means employed. How- 
ever, certain minimal capabilities can be specified: 

1. The data terminal must possess an “extended” 
(ASCII) keyboard, for entry of data, 
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TABLE VIII — (Contd) — Number of WATS lines required vs configurations 


Configuration Location 
IV Boston 

New York City 

Philadelphia 

Baltimore 

D. C. Area 

Atlanta 

Kentucky 

Chicago 

St. Louis 

Kansas City 

Colorado 

New Orleans 

Texas Gulf 

NE Texas 

West Texas 

South California 

San Francisco 

Hawaii 

Utah 

Seattle 

Alaska 


X = Intrastate charges. 

2. The terminal must have an extended storage 
ability to retain up to one day’s input for 
transmission to the data processing site, with 
due allowances for peaks. 

3. The terminal must be capable of receiving and 
storing information transmitted by the ser- 
vicing computer (in general, during the night) 
in response to ad hoc requests or standing 
requirements. 

4. The terminal must be capable of producing in 
hard-copy form all information transmitted 
from the computer site to the Civilian Personnel 
Office or other location either on-line, or on a de- 
layed basis. 

5. The unit must be able to communicate, via 
low cost (voice grade) telephone lines with 
the computer center. 

After careful consideration of the characteristics 
of the many available principal Input/Output equip- 
ments, we have narrowed the field for further con- 
sideration to three. These are: 

1. Mohawk 1103* 


Number 


Cost/ 

of Bands 

Bands 

Month* 

1 

X 

$ 330 

2 

1,X 

1,150 

2 

1,X 

875 

1 

X 

375 

1 

1 

500 

2 

1, X 

1,300 

2 

1,X 

1,200 

2 

2,X 

1,475 

1 

X 

645 

1 

X 

610 

2 

2, X 

1,825 

2 

1,X 

1,400 

1 

X 

815 

2 

1,X 

1,815 

2 

1,X 

1,915 

2 

1,X 

1,850 

1 

X 

650 

- 

- . 

No information 

1 

X 

500 

2 

1, x 

1,775 

2. Dartex 1022. 

3. Commynitype 100SR. 


No information 

*21,005 


The selection 

In performing the cost analysis for each of the four 
configurations examined, four different manufacturers’ 
equipments were examined. These manufacturers’ 
equipments were examined in the context of the proc- 
essing loads required for both the centralized and 
distributed configurations. Table IX shows the specific 
central processors considered. 

Although there are many approaches that can be 
followed to select a central processor for each of these 
configurations, one dominant constraint controlled 
the selections. This constraint was the requirement 
that the computer be able to utilize the amount of 
random access storage needed by the system at each 
location. Thus, although there are smaller computers 
available, not only from the four manufacturers 
whose equipments were examined, but also from other 
sources, the equipment selected represented, in general, 
the smallest computers that could do the job. 
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TABLE IX — Central processors employed in the cost analysis for each configuration 
Manufacturer I II III IV 


CDC 

3304 

BURROUGHS 

3501 

IBM 

350/40 

RCA 

70/45 

Configuration I 

Configuration I has 

a single centralized site into 


which all Civilian Personnel Offices address their data. 
Detailed examination showed that approximately 1.8 
billion bytes of information would be the largest amount 
of dedicated random access storage required at this 
location. Supporting that subsystem would be a high- 
speed disk subsystem of approximately 100 million 
bytes. This subsystem would act as a directory and 
contain a high-use skeletal record for each employee 
whose total record was contained in the random ac- 
cess storage system. The central processor at this 
site is provided with approximately 65,000 bytes of 
high-speed core storage, a normal complement of stan- 
dard peripherals; such as, a high-speed printer, a card 
reader and punch, a communications control unit as 
well as eight high-performance magnetic tape units. 
The magnetic tape units are employed to maintain an 
on-line journal of all system actions. They also act as 
replicate security repositories of current information 
in the event of equipment destruction, electrical in- 
formation losses, over-writing, etc. In addition, the 
magnetic tape drives can be employed for other pur- 
poses during non-civilian personnel operations at the 
site. Finally, magnetic tape would be used to store 
trailer information (overflow beyond single fixed-format 
record storage capcaity) and archival information. 

Communications with the centrally located computer 
site can be handled by four WATS lines. Two of these 
would cover the 48 states, one for the Eastern one- 
fourth of the United States, and one for the Eastern 
Seaboard. The extent of these lines, that is, the number 
of WATS bands, have been selected to provide the 
optimal coverage of CONUS based upon the geographic 
distribution of Civilian Personnel Offices and the 
populations they support. 

Finally, each Civilian Personnel Office was examined 
to determine the number of terminals required. A 
keyboard^ hardcopy printer, and an intermediate 
storage capability are considered a mandatory re- 
quirement for this system application. 

To better understand the operation of the Config- 


3304 

3114 

3114 

3501 

2501 

2501 

360/40 

360/30 

360/30 

70/45 

70/35 

70/35 


uration I system, consider the requirements for data 
transmission from and into a Civilian Personnel Of- 
fice on a daily basis; 26 characters per man per day 
(on the average) are inputted to the computer from 
a Civilian Personnel Office, while 24 characters per 
man per day are outputted from the computer to a 
Civilian Personnel Office. With these figures, an esti- 
mate of the communication requirements can be made. 
Similarly, the estimate of the actual keyboard typing 
or data outputting can be obtained. 

Taking all these facts plus the data provided in an earl- 
ier section into account, it can be shown that the cost 
for Configuration I will be in the order of $130,000 to 
$150,000 rental per month. 

Configurations II, III and IV 

Configuration II is schematically represented in 
Figure 3, Configuration II represents a total of five 
computer sites and thus the amount of rental required 
to support these sites does increase. Similarly, the com- 
munication cost rises from approximately 6.1 to 11. 7K 
dollars per month. 

Configurations III and IV have been treated in the 
same fashion as Configuration II. 

Detailed Configurational Comparison 

The results developed thus far may now be applied 
to the crucial problem — which of the four postulated 
configurations is recommended and why. The central 
processors selected for examination with respect to 
the four configurations postulated have been specified 
in Table IX. 

In presenting a detailed description and price com- 
parison, Table X summarizes the key requirements, 
i.e., the number of personnel serviced, mass random 
access storage capacities, and the data transmission 
volumes for each of the four configurations. These data 
were employed as guides in the hardware selections. 
Detailed equipment specifications and pricing/rental 
were also examined and are summarized in Table XI. 
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TABLE X — Summary of requirements for the four configurations examined 


Configuration Number 

i 

II 

III 

IV 

Number of Computer Sites 

i 

5 

12 

11 

People Serviced 

Maxima (Excluded from Range Figures) 
Minimum 

Range (Not including Maxima) 

Average for Range 

380K 

163K 

45K 

45-63 (4) 

54K 

48K, 75K, 83K 

UK 

11K-27K (9) 

19K 

47K, 58K 

2K 

2K-31K (19) 
14K 

RAM Capacity Requirements (Bytes) 

Maxima (Excluded from Range Figures) 
Minimum 

Range (Not including Max'ma) 

Average for Range 

1.8G 

655M 

162M 

162M-251M 

201M 

225M, 350M, 390M 
52 

52M-127M 

90M 

223M, 180M 
15M 

15M-118M 

62M 

RAPID Supplemental RAM Capacity 
Required 

0 

I.2G 

1.2G 

1.2G 

Data Transmission Volumes (Characters 
per Second) 

Maxima (Excluded from Range Figures) 
Minimum 

Range 

Average for Range 

19M 

8.2M 

2.3M 

2.3M-3.2M 

2.7M 

2.4M, 3.6M, 4.2M 
0.6M 

0.6M-1.4M 

1.0M 

2.4M, 2.9M 
0.1M 

0.1M-1.6M 

0.7M 


Two key aspects of the information, contained in 
Table XI have been plotted to provide a clearer view. 
These are: 

1. Hardware comparisons (exclusive of communica- 
tions and terminal costs) for the four manu- 
facturers, for each of the four configurations. 

2. Comparison of monthly rentals for all hard- 
ware aspects (using an average set of values 
for the on-site computers and their conventional 
peripherals). 

Point 1 is amply described in Figure 7, while Point 
2 is presented in Figure 8. 

At this point, Configurations III and IV were dropped 
from further consideration. The few advantages which 
could be enumerated in their favor were not sufficient 
to outweigh the added costs. 

The selection between Configurations I and II 
appears less clear cut. Although the monthly rental 
for Configuration II is approximately 70 percent 
greater than the monthly rental for Configuration I, 


other factors must be examined. Only then can a 
decision be made. 

In favor of Configuration I implementation are: 

1 . Lower monthly rental. 

2. File centralization in one physical location close 
to Department of the Army and Department of 
Defense headquarters activities. 

3. No undesirable redundancy in either hardware, 
software or machine processing. Also, if this 
processor is identical with that of the present 
RAPID system, then each can act as back-up 
for the other. 

4. Availability of a “dedicated” computer for 
Army Civilian Personnel record-keeping. This 
implies that a self-established priority system 
can be employed. 

5. A minimum of highly skilled ADP programmers, 
operator personnel, etc., required. 

6. Data base “timeliness” and uniformity. 

7. No limitations on “cross servicing.” 
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TABLE XI — Purchase and rental comparison — four computer manufacturers and four system configurations 


Configuration I II III IV 


Subsystem 

Purchase 

Rental 

Purchase 

Rental 

Purchase 

Rental 

Purchase 

Rental 


M $ 

K$ * 

M $ 

K $ * 

M $ 

K $ * 

M $ 

K $ * 

Computer Site- 
Mass RAM 









CDC 

1.0M 

26K 

4.6M 

115K 

9.6M 

226K 

14. 4M 

384 K 

Burroughs 

0.8 

16 

2.6 

60 

5.6 

132 

10.1 

231 

IBM 

1 .1 

17 

4.0 

83 

8.7 

179 

15.4 

316 

RCA 

1.2 

26 

4.7 

96 

8.6 

196 

17.0 

345 

Mass RAM 
(Including 

RAPID 

Supplement) 









CDC 

0.9M 

19K 

1.7M 

41K 

2.4M 

43K 

4.1M 

70 

Burroughs 

2.4 

40 

2.1 

67 

3.4 

55 

7.5 

107 

IBM 

0.8 

17 

1.9 

40 

3.5 

65 

4.4 

95 

RCA 

0.6 

12 

2.0 

40 

3.6 

65 

4.4 

95 

Communications 
Channels-W ATS 

— 

6K 

— 

12K 

— 

20K 

— 

~ 21K 

Terminals 

3.5M 

88K 

3.5M 

88K 

3.5M 

88K 

3.5M 

S8K 

Totals 









CDC 

5.5M 

138K 

9.7M 

255K 

15. 6M 

371K 

22. 0M 

563K 

Burroughs 

6.6 

149 

10.5 

227 

13.5 

294 

21.2 

447 

IBM 

5.3 

131 

8.9 

221 

15.7 

351 

23.3 

519 

RCA 

5.3 

131 

10.1 

235 

16.7 

378 

24.8 

549 


* — Monthly 
M — Millions 
K — Thousands 


On the other hand, Configuration II provides: 

1. Local, autonomous control at the Army level 
of each computer system. 

2. Redundancy of equipment which offers an 
alternative processing site in the event a system 
is down. 

3. With lower processing loads per machine, cost 
sharing could be practiced. 

No numerical weighting of these advantages seems 
appropriate. However, after a thoughtful review of 


each point, and a careful summation of all the points 
concerning each alternative, one is left with, but one 
reasonable choice — Configuration I. 

SUMMARY 

The investigation demonstrated that a highly distrib- 
uted, Automated Personnel and Manpower System 
was feasible and would be cost-effective. It also showed, 
rather forcefully, that although the terminals were 
located throughout the country, a single concentrated 
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480 K 



NUMBER OF COMPUTER SITES 
I II III IV 

CONFIGURATION NUMBER 

Figure 7 — Monthly rental of CONUS computer systems 
for four manufacturers for all four configurations 
(Computer hardware only) 

central processing site was by far the most economical 
approach to the system implementation. 

An interesting fallout of the study was the fact 
that the cost of the communications channels required 
to support the system accounted for only one and 
one-half to three percent of the cost of the fully-im- 
plemented system. 

Finally, the broadest result of the study was the 
conclusion that real time, on-line (or quasi on-line) 
systems were practical, cost-effective and currently 
attainable. 
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Analysis of the communications aspects 
of an inquiry -response system 


by J. S. SYKES 

Bell Telephone Laboratories, Incorporated 
Holmdel, New Jersey 

INTRODUCTION 

In order to meet the information retrieval needs of 
various industries, inquiry-response systems are being 
implemented by storing large data bases in centralized 
computer files. In some systems, the files are accessed 
by personnel primarily as the result of telephone calls 
from customers. As an example, in the airlines industry, 
computer files are accessed by reservation clerks to 
determine the availability of reservations for a specific 
flight. In this example, and in similar applications 
involving queries or requests from customers, input 
messages requesting certain information are generated 
by a customer representative and then transmitted 
to a computer from an input-output terminal such as 
a visual display device. When the computer has ob- 
tained the requested information, a response message is 
transmitted back to the requesting terminal, and the 
representative continues her dialogue with the customer. 

For an inquiry-response system to function properly, 
the system must be designed to meet two grade-of- 
service, or performance, objectives. One objective is 
concerned with the interval a customer must wait 
before his call is answered by a representative. The 
other objective is concerned with the interval a cus- 
tomer must wait during the conversation until the 
customer representative can secure the necessary in 
formation from the computer; the naturalness of the 
dialogue degenerates as the retrieval time* increases 


* In this analysis, the retrieval time is defined as the interval from 
the time an input message is generated until the complete response 
has been received. 


In order to meet the first objective, sufficient repre- 
sentatives must be available to handle the incoming 
voice traffic. To meet the second objective, an adequate 
data communications subsystem and sufficient com- 
puter processing capability must be provided. 

In this paper an analytical model is presented that 
approximates the interaction of the voice and data 
communications subsystems in an inquiry-response 
system. The model can be used during preliminary in- 
vestigations to gain insight and to obtain conservative 
estimates of communications capabilities required in 
order for a system to meet specified grade-of-service 
objectives. The model consists of relationships that 
involve basic communications parameters such as the 
following: 

a. Rate at which calls are received from customers 

b. Interval required for representatives to handle 
incoming calls 

c. Number of input and corresponding computer 
response messages generated as the result of 
a customer call 

d. Data transmission rates to and from the com- 
puter 

e. Lengths of input and response messages. 

The model uses these relationships in order to esti- 
mate the following quantities : 

1. The number of customer representatives re- 
quired in order to handle a given volume of 
offered calls at a specified grade of service 

2. The volume of data traffic generated as a result 
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of the incoming voice traffic 

3. The number of equivalent active terminals 
that can be served by a data link while meeting 
a specified retrieval time objective; an estimate 
of the retrieval time as a function of carried data 
traffic is used to obtain this quantity. 

4. The number of data links required in order to 
meet a specified retrieval time objective 

5. The average occupancy of the one or more data 
links serving the input-output terminals. 

*For illustrative purposes, the analytical model is 
applied to a hypothetical inquiry-response system. 
Both half-duplex** and full-duplex** methods of opera- 
tion are considered for the data communications sub- 
system. For this example, estimates of average re- 
trieval time are obtained with mathematical queuing 
models. 

Assumed system characteristics 

The analysis and its application presented in this 
paper are based on assumptions concerning the in- 
coming voice traffic, the characteristics of the data 
communications subsystem connecting the input- 
output terminals to the computer, and the computer 
processing capability. These assumptions are considered 
in this section. 

The basic assumptions that have been made concern- 
ing the origination and nature of the voice traffic are 
the following: 

1. The overall system is in a state of statistical 
equilibrium. 

2. Calls are generated by custbmers in accordance 
with a Poisson distribution, which implies a 
large group of potential customers. 

3. Durations required for representatives to handle 
incoming calls*** are distributed according to a 
negative exponential probability law. 

4. Calls are answered immediately when there is 
a customer representative not currently engaged 
in a conversation; all other calls experience 
delay. 

5. Delayed calls are answered in a first-come, first- 
served order as representatives become free. 


** With half-duplex operation, message transmission is allowed in 
either direction, but not both directions simultaneously; simul- 
taneous transmission in both directions is called full-duplex 
operation. 

*** These durations would consist of the talking time with the 
customer plus subsequent time (if any) required to perform 
call-related tasks. 


The assumed overall configuration of the voice- 
access network as well as the data communications 
subsystem is illustrated in Figure 1. The voice-access 
network is assumed to consist of the established tele- 
phone network that provides line-switched connections 
from the customer to the business location. Calls are 
automatically routed to an idle representative unles.s 
they must be delayed; if so, the call distributor main- 
tains the calls in a queue until a representative becomes 
free. 

The data communications subsystem is assumed 
to consist of a group of input-output terminals such as 
visual display devices that are associated with a 
common control unit, which is connected to a computer 
by means of a data link. Various methods of operation 
are possible for this data communications configuration. 
These possibilities depend on whether or not message 
transmission is allowed simultaneously in both direc- 
tions, whether or not the computer requests traffic by 
means of polling, whether the polling characters are 
directed to individual terminals or to a control unit 
that gathers input messages from all of the terminals, 
etc. This analysis considers both full-duplex and half- 
duplex methods of operations. Polling of the control 
unit by the computer and multimessage transmissions 
in each direction are assumed. 

Computer processing time, as used in this paper, 
refers to the overall interval from the instant an input 
message* enters the multiprocessing computer until 
the corresponding response is placed in queue for 
transmission back to the requesting customer repre- 
sentative. Thus, processing time includes input message 
analysis, data retrieval from one or more memory files 
(perhaps even from another computer), and response 
preparation; in addition, the processing times may be 
prolonged by queuing delays within the computer. It 
has been assumed that an estimate of the average com- 
puter processing time for a system is available; as will 
be explained, this estimate is used in determining the 
average retrieval time. 

System analysis 

In this section the analytical model of the communi- 
cations aspects of an inquiry- response system is de- 
veloped. 

* Examples of input messages are initial inquiries, requests for 
page flips, and any subsequent inquiries generated during a 
customer’s call. In addition, in some systems, update messages 
may be sent to the computer, perhaps after a call has been ter- 
minated. If so, it is assumed that for each updating message the 
computer returns some type of acknowledgment. 
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Figure 1 — Inquiry- response system 


t — x(S - En) % 

- 1 - sP[D > 0] exp \ ; 

1 Y ' 

0 < Eh < S (2) 

where P[D > 0], commonly called the Erlang C func- 
tion, is given by 


(Eh) s 


P[D > 0] = 


(S - 1) ! (S - E«) 


S-l (En) n (Eh) S 

2 — + 

B -° n! (S - 1) ! (S - Eh) 


0 < Eh < S . . (3) 


Equation (2) is a result of A. K. Erlang's exponential 
holding time analysis. A summary of his analysis along 
with various delay curves was published by E. C. 
Molina. 2 For specified values of S, values of P[D > 0] 
are tabulated in Reference 1 as a function of the ratio 
E./S. 


Personnel required to handle offered voice load 

Assume that during the period of maximum incoming 
customer calls, i.e., the system busy hour, the calls are 
received at a rate X,. Assume further that the average 
duration required for representatives to handle in- 
coming calls is y. The voice load E* handled by the 
representatives is therefore given by 

Ev = X.V, (1) 

where Ev is commonly expressed in erlangs, a dimen- 
sionless unit. The number S of personnel required to 
handle E* erlangs during the busy hour is dependent 
on the grade of service G(x) to be offered customers, 
i.e., the promptness with which customers' calls would 
be answered. An example of G(x) is the following: at 
least 0.95 of the customers' calls should be answered 
within x = 20 seconds from the time ringing begins. 

If the assumptions previously stated concerning the 
voice traffic are met, values of S can be obtained for 
a specified G(x) by using the following formulas: 1 

G(x) = 1 — Prob[Answering Delay > x secs] 

= 1 - P[D > x] 


Conversion of offered voice load to data traffic 

The amount of data traffic generated as the result 
of a customer call is a random variable. Some calls 
may involve only one or possibly two input messages 
and the associated responses. Other calls, which may be 
multipurpose, may require six or eight such interac- 
tions; in addition, some updating of the computer files 
may be involved. In this paper, T will be used to repre- 
sent the average number of interactions generated as 
the result of a call. 

Let \i represent the average rate during the busy 
hour at which input messages are generated by the 
group of customer representatives served by one data 
link. By using I, X* can be related to X, as follows: 

\i = IX, . (4) 

Let X r represent the average rate during the busy 
hour at which corresponding response messages are 
prepared by the computer and placed in the output 
queue for the data link. Since it is being assumed that 
each input message to the computer results in a re- 
sponse, the average rates X* and X r are equal. 

The second factor influencing the volume of generated 
data traffic is the average time t» required to transmit 
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a message to the computer. This quantity is the quo- 
tient of 

~li — the average number of characters that com- 
prise messages transmitted to the computer, 
anc 1 

n = the rate of transmission from the contro^ 
unit to the computer, i.e., 


Ptot{ri , T r ) — pi(Ti) + pr(r r ) 


EJ 


V 



( 8 ) 


Let a represent the ratio of the average length of 
response messages to the average length of input mes- 
sages; if n = Tr f Equation (8) then reduces to 


li 


U = — . 


Ti 


(5) 


Ptot(Ti, T t ) = — — [1 + Oi] . (9) 

Vr< 


Correspondingly, tr, the average time required to 
transmit response messages from the computer to the 
control unit is given by 


In summary, Equations (7), (8) and (9) reveal the 
manner in which the various communications param- 
eters affect the amount of generated data traffic. 


tr 



T r 


( 6 ) 


The product of X* and ti, which wjll be denoted by 
p»(r t ), represents the erlangs of data traffic generated 
during the busy hour for transmission at a rate r t - from 
the control unit to the computer. Likewise, the product 
of X t - and tr, which will be denoted by p r (r r ), represents 
the erlangs of response data traffic transmitted at a 
rate r r from the computer to the control unit during 
the busy hour. 

With full-duplex message transmission, separate 
one-way transmission facilities carry p»(r») and p r (T r ). 
Therefore, expressions for the magnitudes of pi(r») and 
p r (r r ) (in erlangs) can be independently determined 
by using Equations (1), (4), (5), and (6), i.e., 

PtOi) = Xit* 


which leads to 



(7a) 


Similarly, since it is being assumed that X* = Xr, 

E v fl 

Pr(Tr) = — . (7b) 

VTr 

With half-duplex message transmission, the same 
facility is alternately used for input and output traf- 
fic. Therefore, p»(ri) and pr(r r ) can be combined to give 

Ptot(Ti , r r ), i.e., 


Volume of data, traffic allowed per data link 

As indicated by the notation, calculated erlang 
values obtained for p*(r t ), pr(r r ), and p to t(r r r ) are 
based on specified transmission rates. Erlang values 
are not sufficient by themselves, however, to determine 
the number of data kinks operating at the assumed rates 
that would be required to implement the data com- 
munications subsystem. For example, if ptot(ri , r r ) 
were less than one erlang, it could be interred that one 
data link would suffice for that traffic. However, in 
order to avoid excessive storage usage and extended 
retrieval times, data links cannot be used to their 
full capacity. In fact, as will be shown, the average 
retrieval time increases without bound as the average 
occupancy of a data link approaches unity; average 
data link occupancy refers to the average portion of 
the busy hour that the data link is being used for mes- 
sage transmission. 

Although data link occupancy must be limited, it 
is desirable to use data links as efficiently as possible. 
Let ptot (t i) Tr) denote the maximum volume (in 
erlangs) of data traffic that can be carried by the data 
link operating at transmission rates Ti and r r . For half- 
duplex operation, ptotir^ t t ) is numerically equal to 
Pma*(Ti,T r ), wher e Pmax (r iy Tr) represents the maximum 
allowable occupancy for a data link operating at rates 

and Tr. 

For full-duplex operation, p* 0 f(ri, t t ) is the sum of 
Pi (t { ) and p r (Tr), which represent the maximum data 
volume (in erlangs) that can be carried on the input 
and output links operating at rates Ti and r r , respec- 
tively. For systems in which the input and output 
traffic volumes are unequal, the average occupaucy 
of the input and output links may be considerably 
different. For this case, 
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PmaxO* if JVTax[p max (T i) , Pmax(‘ 7 ’» , )L 

where p m ^(n) and p max (rr) correspond to fain) and 
/5r(Vr), respectively. For a full-duplex system, p /0 *(ri, r r ) 
is not a constant but is dependent' on the ratio t r /t»- 

The value of p* 0 *(ri, r r ) for a particular subsystem 
is governed by the specified retrieval time objective 
for the system. A commonly used objective is as fol- 
lows: The average retrieval time should be T max 
seconds or less. Another type of objective* can be 
expressed similar to the voice traffic grade of service 
G(x), e.g., 0.95 of the retrievals should be received 
within T' seconds. 

Either a computer simulation or analytical means 
can be used to determine values of $ t ot (r*, r r ) that 
correspond to a specified retrieval time objective. 
With a properly written simulation, one can obtain 
probability distributions as well as all moments of 
interest. However, using a simulation can be costly 
during preliminary investigations in which one is 
studying the effects of various communications param- 
eters on the retrieval time. For this reason, a well- 
formulated mathematical queuing model can be useful 
and rewarding for these investigations, even though 
results from queuing models that represent complex 
systems are often limited to average values. 

Number of input-output terminals allowed 
per data link 

An important consideration in the communications 
design of an inquiry-response system is the maximum 
number of active input-output terminals, or equiva- 
lently the maximum number of active personnel, that 
can be served by a particular data link without exceed- 
ing a specified retrieval time objective. This maximum, 
which will be denoted by S^, is obviously related to 
ptotiji , r r ). In this section, a method is outlined for 
approximating values of for specified values of 
the grade-of-service objectives and the other com- 
munications parameters. As will become apparent, the 
method may be used iteratively to determine which 
combinations of parameter values permit specified 


* System studies are often desirable in the final design stages of 
a system to determine whether a given design will allow a specified 
percentile-type objective to be met. Because of mathematical 
complexity, however, analytical methods can seldom if ever be 
used for such studies; a simulation is normally required. For 
preliminary investigations, analyses based on average values can 
be used to obtain valuable insight concerning the sensitivity of 
the retrieval time to various system parameters. This insight can 
be very helpful in designing and running a subsequent simulation. 
Lack of such insight often results in very costly system simula- 
tions. 


voice traffic and retrieval time objectives to be met* 
If costs are associated with these combinations, insight 
can be gained concerning which means of implementa- 
tion is most economical. 

The first step towards getting values of is to 
use the specified values to construct graphs (using 
Equations (3) and (8), respectively) that show S 
versus E v and p to t (r*, r r ) versus E„, where p t ot(r if Tr) 
represents the sum of pi (n) and p r (r r ) for both full- 
duplex and half-duplex cases. Corresponding points 
from these two graphs are then plotted to give a third 
graph showing S versus p to t (r», r r )) the value of S 
corresponding to the point p <0 f(r», r r ) = %ot (r», Tr) 

IS S ynftY ' 

The second step is to relate the values of S mnx and 
the values of the retrieval time objective, e.g., T max , 
that correspond to equal values of p to t (r», r T ). Thus, 
a graph such as verus T max can be constructed 
for given transmission rates t% and 7 >. The benefit of 
such graphs can be increased considerably if the or- 
idnate also shows the values of Ev that correspond to 
the values of S^. By using estimates of the expected 
voice load incoming to a cluster of customer repre- 
sentatives, the number of data links required to ac- 
commodate the cluster can be readily deduced from 
the graph for each specified retrieval time objective. 
This technique will be discussed further in the model 
application section. 

Plots of and E v versus the retrieval time ob- 
jective can aid investigations of the cost of imple- 
menting a system to meet a specified average retrieval 
time objective. For example, a designer “may discover 
that for a relatively small increase in the allowable 
Tmax, considerable savings in transmission and com- 
puter port costs could be achieved by serving more 
representatives with a single data link. 

Number of data links required 

As was mentioned above, graphs showing and 
E v versus the retrieval time objective can be used to 
estimate L (r», r r ), the number of data links required 
to interconnect the computer and the input-output 
terminals serving the customer representatives. There 
is also a more analytical method for estimating L (r*, t t ) 
in which values of p< 0 *(r*, r r ) are used. The same general 
method can be applied to full-duplex and half-duplex 
message transmission subsystems; however, it should 
be remembered that for the full-duplex case, the value 
of ptot(riy rr) may change if the value of the ratio 
tr/t * is changed. 

Let k represent the ratio of the total volume of 
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generated input and output data traffic (in erlangs) 
to ptot (t i, t r) , i«e», let 

Ptot(j i ) t ) 

k = — — 

ptotijh Tr) 

Let K represent the integer part of the ratio k. The 
number of data links required to serve the cluster is 
given by 

L (j iy Tr) = 1 + K (10a) 

IfK = k, i.e., k is an integer , then 

L(r<,rr) = K 

For half-duplex links, if it can be assumed that the 
total volume p t ot{r iy r T ) is divided evenly among them, 
the average occupancy of each link is given by 

Ptotij i- Tr 

p = _ (10b) 

L (r<, T V ) 

Model application 

In this section, the analytical model is applied 
to a hypothetical information retrieval design problem. 
For this example, it is assumed that information re- 
quired for the operation of a business, such as customer 
service and billing records, is to be stored in a com- 
puter. Input-output terminals will permit access to the 
computer files; it is assumed that retrievals are pri- 
marily required in order to intelligently handle tele- 
phone calls from customers. Several clusters of input- 
output terminals are to be served by the same computer 
complex. The cluster to be considered in this example 
is concerned only with information retrieval; it is 
assumed that file modifications are done by other per- 
sonnel. 

The basic configuration proposed for this cluster is 
illustrated in Figure 1. Telephone calls from customers 
are routed by the automatic call distributor to idle 
customer representatives. Each customer representa- 
tive is equipped with an input-output terminal. These 
terminals are associated with a common control unit, 
which is connected to the computer by means of a data 
link. 

One objective of this analysis is to determine the 
basic requirements of the data communications sub- 
system, i.e., how many common control units in con- 


junction with their data links are required to accom- 
modate the number of customer representatives that 
will be needed to handle the incoming telephone calls? 
To help answer this question, both full-duplex and 
half-duplex methods of operation are considered. Fol- 
lowing the description of these assumed methods of 
operation, representative parameter values are used 
to indicate how these two proposals can be quanti- 
tatively compared. 

Description of assumed methods of operation 

The first assumed method of operation to be de- 
scribed involves half -duplex message transmission, 
which may have some economic advantages over full- 
duplex operation for some geographical configurations. 
Half-duplex operation is more suited for clusters 
generating and receiving relatively low data traffic 
volumes and for which retrieval time objectives are 
not critical. One disadvantage of half-duplex operation 
is the line time required to reverse the direction of 
transmission; this interval will be referred to as the 
reversal time. 

The disadvantage of reversal times can be partially 
overcome if the computer polls and delivers groups 
of messages to the common control unit instead of 
single messages to the individual input-output ter- 
minals. This method of operation will be referred to as 
group poll and delivery as opposed to single poll and 
delivery operation. When a large number of terminals 
are served by a control unit, group polling significantly 
reduces the line time required for reversing the direc- 
tion of transmission and transmitting polling characters. 
In addition, as the volume of data traffic increases, 
group polling lessens the variance of the interval from 
the time an input message is ready for transmission 
until it has actually been transmitted to the computer. 

When the reversal time durations are comparable 
to message transmission times, data link efficiency is 
increased considerably by allowing multimessage trans- 
missions for both input and response messages, i.e., 
priority is not assigned to either type of message. 
Line efficiency increases because a reversal is not re- 
quired following the transmission of each lower priority 
message in order to check the status of the higher 
priority message queue. 

With group polling and multimessage transmissions, 
all input messages generated by the terminals since 
the last poll are sequentially transmitted to the com- 
puter. Only when the input message queue becomes 
empty is the direction of transmission reversed. After 
the reversal, the computer begins delivering the queue 
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p: transmission of polling cooe 
r: transmission facility reversal 

Tj! TRANSMISSION OF INPUT MESSAGES 
T r : TRANSMISSION OF RESPONSES 
Dp: COMPUTER DELAY PRECEDING NEXT POLL 
(ASSUMED ZERO IN THIS PAPER) 


Figure 2 A — Typical cycle of operation (Half -duplex 
message transmission) 


of responses to the control unit, which distributes each 
response to the appropriate terminal. After all responses 
have been delivered, the computer polls the control 
unit either immediately, or optionally after some 
specified delay* D p , and the cycle repeats. A fixed 
number of characters that identify the control unit is 
sent preceding message transmissions from the control 
unit. This cycle of operation is illustrated in Figure 2A. 

During a given cycle, either queue or even both 
queues can be found empty. If, for example, both are 
found empty, a group poll and delivery cycle de- 
generates to a polling sequence followed by a succession 
of reversal times, which are separated by a “No Traf- 
fic” character sequence.that identifies the control unit. 
Such degenerate cycles are assumed to reoccur until 
at least one message accumulates in either the input 
or the response message queue. 

With the full-duplex case, reversal times are unneces- 
sary, since the control unit can be transmitting and 
receiving simultaneously. However, group polling of 
the control unit is still beneficial, since polling inter- 
ference on the delivery line occurs less frequently. With 
full-duplex operation, input messages generated by 
the customer representatives are ordered in a first-come, 
first-served manner for transmission from the control 
unit. Transmissions to the computer begin immediately 
after a polling code is received from the computer; 
the polling codes are interspersed among messages re- 
ceived from the computer. All messages that have 
accumulated awaiting the polling code, as well as those 
that are generated during the transmission, are trans- 
mitted to the computer. An interval of duration** D p 
starts at the end of a transmission from the control 
unit; at the end of this interval the computer sends 
another polling code. 


* For the half-duplex case, D„ is assumed to be zero in this paper 
for the half-duplex case. 

** For full-duplex operation, D p was assumed to be one second 
in this paper. 


As soon as a response is prepared by the computer, 
it is entered into an output queue for delivery. It is 
transmitted immediately unless ‘another transmission 
is already in progress; if so, the response is delayed until 
all responses ahead of it in the queue have been sent. 
Thus, while input messages are being transmitted by 
the control unit, response messages corresponding to 
previous input messages are being received by the 
control unit. Full-duplex message transmission is 
illustrated in Figure 2B, 

Personnel to handle incoming voice traffic 

It will be assumed that the assumptions stated 
previously concerning voice-access subsystems apply to 
this example. It will further be assumed for this example 
that during the busy hour of the busy day the average 
number of calls per hour are not expected to exceed 
600; the average duration of each call is expected to 
be approximately three minutes. By using Equation 
(1), it is found that E v , the expected voice traffic load, 
should not exceed 30 erlangs. In order to determine S, 
the number of customer representatives required to 
handle this traffic volume, a grade-of-service objective 
must be specified. In Figures 3A and 3B, S has been 
plotted as a function of E v . Figure 3A shows G(10), 
G(20), and G(30), where each is assumed equal to 
0.95. Figure 3B shows the effect of varying the value 
of G(20) from 0.90 to 0.975. 

Figures 3A and 3B reveal that the grade-of-service 
standard for answering voice calls can be improved 
considerably with the addition of a relatively few 
representatives. For example, assuming that the aver- 
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Figure 2B — -Typical cycle of operation (Full-duplex message 
Transmission) 
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(G (X)- 0.95) 



Figure 3 A — Effect of grade of service on number of 
personnel required [G(x) = 0.95] 


EFFECT OF GRADE - OF- SERVICE 
ON NUMBER OF PERSONNEL REQUIRED 

(g (20) ■ y) 



E v , AVERAGE VQICE LOAD 
(ERLANGS) 

Figure 3B — Effect of grade of service on number of 
personnel required [G(20) = y] 

age voice load during the busy hour is 30 erlangs and 
the grade-of-service objective is such that calls should 
be answered within 20 seconds^ Figure 3B indicates 
that the fraction of calls that nieet the objective can 
be increased from 0.9 to 0.975 by increasing the number 


of representatives from 38 to 42, These additional 
representatives could be individuals that are assigned 
as representatives only during busy hour conditions. 

Conversion of offered voice load to data traffic 

The amount of data traffic generated is dependent on 
the degree of interaction between customer repre- 
sentatives and the computer. As an example, pro- 
cedures could be outlined that would minimize the num- 
ber of computer interactions per call by simply trans- 
mitting in a single response as much as possible of the 
information in a computer file. On the other hand, if the 
intent were to minimize the information that must be 
read by representatives, several interactions could be 
used during which the computer eliminated most of 
the undesired information. Computer processing limita- 
tions would favor the former method of operation; 
human factors considerations may favor the latter. 3 

An illustration of the effect of interactions on the 
amount of data traffic generated for a half-duplex 
method of operation is presented in Figure 4, which 
shows p t ot{r iy rj r ) as a function of E„. Let Type I in- 
teractions be those in which whole pages of information 
are transmitted to the representative; parameter 
values assumed are 1 T — 300 characters and I == 3 
interactions. Let Type II interactions be those in 
which more specific items of information can be re- 
quested; values assumed are ~l r — 75 characters and 
1 = 6 interactions. Figure 4 indicates that in order to 
accommodate 30 erlangs, the less interactive method 
would require at least two half-duplex data links where- 
as one link may suffice for the Type II method, de- 
pending on the specified retrieval time objective. 

Volume of data traffic allowed per data link 

For this example it has been assumed that the re- 
trieval time objective would be stated as an average 
value, i.e., as T^. Mathematical queuing models 
have therefore been used for this example to aid in 
determining values of ptotixii Tr). Separate models were 
used to represent the half-duplex and full-duplex 
methods of operations; descriptions of these models 
and associated formulas are presented in the Appendix*; 
a derivation of the formulas for the half-duplex model 
appears in Reference 4. 


* A computer simulation was used to verify the^queuing model 
of the half-duplex method of operation. Values of T obtained with 
the queuing model were found to be conservative estimates. 
Additional discussion concerning the results of the queuing model 
and the simulation appears in the Appendix. 
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Figure 4 — Conversion of voice load to data traffic 


The correspondence between p tot (r», r r ) and T max was 
actually established in reverse, i.e., values of the average 
retrieval time T were calculated as a function of the 
total volume (in erlangs) of input and output data traffic 
carried by the data link. The five durations included 
in this retrieval time calculation are the following: 

Di= Delay of an input message awaiting trans- 
mission 

t* = Transmission time of the input message 

C p = Computer processing time, i.e., interival from 
the arrival of the input message until the 
appropriate response is entered in an output 
queue 

D r = Delay of the response in a computer output 
queue 

t r = Transmission time of the complete response, 

T was obtained by summing the mean value of these 
intervals, i.e., 

T = Di + ti + Cp + Dr + tr (11) 

The queuing models were used to determine values 
of D» and Dr- Since the server in these models repre- 
sents the data link, these delay values depend on 
the average occupancy of the data link. For the half- 
duplex case, the average occupancy is numerically equal 
to ptotiri, r r ), providing p M (ji 9 t t ) < 1. For the 


full-duplex case, the average occupancies of the input 
and output links are numerically equal to pi(n) and 
p r (r r ), respectively, providing Pl (r<) and p r (r r ) are 
both less than 1. 

Values for t» and tr were obtained from Equations 
(5) and (6). The value of C p was chosen to be two 
seconds; for other analyses, the value should be chosen 
to fit the characteristics and expected load of the 
system computer. With Cp = 0, it should be noted 
that T represents the average retrieval time due solely 
to data communications, i.e., message queuing and 
message transmission. 

In Figures 5A and 5B, T is plotted as a function of 
the erlangs of data carried per link for the half-duplex 
and the full-duplex cases, respectively; in each plot, 
T is shown for different average response lengths. For 
the half-duplex case, the erlangs of data carried per 
link is equivalent to the average data link occupancy. 
For each of the plots, as the erlangs of carried traffic ap- 
proaches zero, T approaches the sum of ti, tr, C p , and 
R, where R = 0.2 seconds for the half-duplex case 
and zero for the full-duplex case. Figure 5A can be 
converted into plots of T versus E„ by reference to 
Figure 4. 

Other communications parameter values assumed 
for the plots in Figures 5A and 5B are as follows: 

2i = 15 characters 
c 2 (A) = 0.1* 
c 2 (4) = 0.5 
C p = 2 seconds 

Ti = T r = 120 characters per second. 

The queuing models permit values of each of these 
parameters to be varied individually or in various 
combinations; by observing the results of such varia- 
tions, insight is gained concerning which parameters 
most significantly affect T. As was mentioned pre- 
viously, the graphs can also be used in reverse to deter- 
mine the effect of parameter variation on values of 
T r ) for specified values of T max . 

Number of customer representatives allowed 
per data link 

By relating values of S and p to t(ri , r r ) appearing in 
Figures 3 and 4, respectively, that correspond to equal 


* The coefficient of variation of a random variable y, whichjs 
denoted by c 2 (y), is defined as follows: c 2 (y) =* Var(y)/y 2 . 
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ERLANGS OF DATA CARRIED PER UNK 
(AVERAGE DATA LINK OCCUPANCY) 


Figure 5A — Effect of response length on average 
retrieval time (Half-duplex message transmission) 



ERLANGS OF DATA CARRIED PER LINK PAIR 
(INPUT PLUS OUTPUT TRAFFIC) 


Figure 5B — Effect of response length on average retrieval time 
(Full-duplex message transmission) 

values of E», a graph of S versus p, 0 *(r t -, r r ) was ob- 
tained. This graph was then used in conjunction with 
Figure 5A, which shows T versus ptot(n> T r) in order 
to obtain Figure 6A, which shows S max versus T max 
for the half-duplex method of operation. Values of 



( SECONDS) 

Figure 6A — Effect of response length on personnel 
allowed per data link (Half-duplex message 
transmission) 

E* that correspond to values of S max for G(20) = 0.95 
are indicated on the right-hand vertical boundary of 
the graph. Curves are plotted to depict the effect of 
Type I and Type II interactions. Values of T max that 
fall to the right of these curves can be achieved. 

The graph indicates that for the indicated param- 
eter values, the more interactive procedure allows 
considerably more personnel to be served by a single 
data link. With Type I interaction, i.e., the less in- 
teractive procedures, a T^ of five seconds cannot be 
met. However, with Type II interactions, this objec- 
tive can be met for values of S less than approximately 
35 representatives, which would be required to handle 
an incoming voice load of approximately 26 erlangs. 
If a Tmax of three seconds is desired, it is obvious that 
some of the parameter values must be changed. Perhaps 
the transmission rates t x — t t could be increased, or 
if possible, C p could be reduced. Trade-offs can thus 
be studied between data communications and computer 
processing capabilities. 

Figure 6B shows S max versus Tma X for full-duplex 
message transmission. As expected, the graph indicates 
that full-duplex operation allows more representatives 
to be served on one data link than does half-duplex 
operation. Figure 6B also reveals that with full-duplex 
message transmission, a T ma!E of five or possibly four 
seconds can be met with one data link while handling 
30 erlangs of incoming voice traffic. In comparison, 
reference to Figure 6A reveals that with half-duplex 
operation, two data links would be required to meet a 
T m ax °f five seconds with E v = 30 erlangs; with one 
data link, T would equal approximately seven seconds. 
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Tmax, average retrieval time objective 
(SECONDS) 

Figure 6B — Effect of response length on personnel allowed 
per data link (Full-duplex message transmission) 

SUMMARY 

An analytical model has been presented that can be 
used for preliminary investigations of the voice and 
data communications aspects of inquiry-response sys- 
tems. The model can be used to gain insight and to 
obtain conservative estimates of communications capa- 
bilities required in order for a system to meet speci- 
fied grade-of-service objectives. 

In particular, the mathematical relationships in 
the model can be used to estimate quantities such as 
the number o customer representatives required to 
handle incoming voice traffic and the volume of data 
traffic generated as a result of this voice traffic. 
These estimates in conjunction with retrieval time 
estimates are used to predict the number of data links 
required and the number of equivalently active input- 
output terminals that can be served by a data link 
without exceeding a specified retrieval time objective. 

The model is useful for studying the sensitivity of 
the voice and data communications requirements to 
changes in various communications parameter values. 
This insight can aid in limiting the cost of subsequent 
detailed system simulations. Also, the model can be used 
iteratively to determine which combinations of param- 
eter values permit specified voice traffic and retrieval 
time objectives to be met most economically. 

As an illustration, the model is applied to a hypothet- 
ical system. Requirements for full-duplex and half- 
duplex message transmission are compared. The as- 
sumed methods of operation are characterized by group 
polling of and delivery to a common control unit rather 
than individual input-output terminals. For this appli- 


cation, estimates of average retrieval time as a function 
of erlangs of input and output data traffic were obtained 
by using delay formulas from mathematical queuing 
models. 
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APPENDIX 

Queuing models were used in the application section 
of this paper to represent the assumed methods of 
operation of the data communications subsystem. This 
appendix contains a description of these queuing models 
as well as the associated delay formulas used for cal- 
culating D i and Dr, two of the terms in the expression 
forT. 

Half-duplex message transmission 

The queuing model selected to represent the assumed 
half-duplex method of operation is a single-server 
dual-queue model 4 in which service is alternated be- 
tween the two queues; a finite interval is required to 
switch service from one queue to the other. Each queue 
is assumed to have an independent Poisson input and 
an independent general service time distribution. The 
alternating priority rule is followed. With this rule, 
all customers entering a queue while that queue is 
being served are also served; when that queue even- 
tually becomes empty, service can be switched to the 
other queue. 

In this model the single-server represents the data 
link that alternately allows transmission of the input 
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messages that accumulate in the control unit and the 
responses that accumulate in the computer. The 
service times in the model represent the intervals t< 
and tr required to transmit individual messages. The 
switching, or reversal, times represent the intervals 
required to reverse the direction of data link trans- 
mission. For calculation purposes, it can be assumed 
that the reversal times also include the constant 
intervals required to transmit a fixed number of char- 
acters for supervisory purposes. Examples are polling 
sequences to request input messages from a control 
unit and identification sequences that precede input 
messages to identify the transmitting control unit. 

Assuming the transmission tiimes t* and t r have 
mean values t* and tr and coefficients of variation 
c 2 (ti) and c 2 (tr) and assuming: the facility reversal 
time R, the polling time Pi, and the control unit identifi- 
cation time P 2 have constant durations, the formula for 
D; is as follows: ! 

Pitigj Prgrtf (1 Pi) 2 H~ PigituPr 2 

D» 1 J — 

2(1 - Pi ) 2(1 - Pi )( 1 - p)(l - p + 2pi Pr ) 

(1 - pi)(J, + J 2 ) 

_| 

2(1 - p ) 

where 

p = average occupancy of the data link 

= Pi + p r < 1 

pi = Xjt» ^ 1 

Pr Xrtr 1 

Ji - (R + Pi) 
h = (R + P 2 ) 

Si = U + c 2 (t.)] 

gr = [1 + C 2 (tr) ] 

Note that as p — > 0, 

(Ji + J 2 ) 

Di — * — 

2 

The formula for D r is identical to the one shown for Di 
with all i subscripts changed to r’s and vice versa. 

Two additional formulas thajt may be helpful in 


estimating storage usage at the control unit and at the 
computer are the following, which give the average 
number of input messages and responses, respectively, 
that would be included in a multimessage trans- 
mission : 

Xi[Ji + J2] 

Ni 

(1 - p ) 

Ar[Jl + J2] 

Nr = 

(1 -P) 

A computer simulation was used to determine how 
well this queuing model represents the assumed meth- 
od of operation. Values of T obtained with the queuing 
model were found to be conservative. In general, the 
best agreement was obtained as long as values! of p were 
less than 0.6 to 0.7; differences were within a range 
from zero to 15 percent. With most combinations of 
parameter values, the disparity increased significantly 
for values of p exceeding 0.8; i.e., the queuing model gave 
overly conservative estimates of T. Agreement im- 
proved as the value of a — ( T /ti decreased and/or the 
value of Ti = r r increased. 

The disparity can be explained as follows: in the 
simulation the arrival pattern of responses in the com- 
puter output queue was not quite as random as is ex- 
pected for Poisson arrivals, which are assumed in the 
queuing model. A principle of queuing theory is that 
as regularity of arrivals and service times increase, 
the average delay decreases. 5 Excellent agreement 
between the results of the queuing model and the simula- 
tion were achieved when the value of \ r used in the 
queuing model was set equal to 0.9 times the \r uiged 
for the simulation. 

Full-duplex message transmission 

Independent models were selected to represent the 
input message queue and the response queue in the 
assumed full-duplex method of operation. Polling inter- 
ference on the delivery line was assumed to be negligible. 

For the response queue^ the classical M/G/l model was 
assumed. For the input message queue, an accumulation 
interval of D p seconds was assumed prior to each poll. 
This situation was modeled as an M/G/l queue with a 
setup time of D p . Assumptions for t* and t r are the 
same as stated for the half-duplex case. Formulas for 
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and D r are as follows: 

Expressions for N * and Nr for this case are as follows: 

X,t?[l + c 2 (t<)] D P 

X,D P 

D, = + _ 

N< ; D p > 0 

2(1 - Pi) 2 

(1 - P.) 

A.t?[l + c 2 (t r )] 

1 

Dr 

Nr . 

2(1 - P r) 

1 — Pr 
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INTRODUCTION 

In order to reduce the communications costs in time- 
sharing systems and multicomputer communication 
systems, multiplexing techniques have been introduced 
to increase channel utilization. A commonly used 
technique is Synchronous Time Division Multiplexing 
(STDM). In Synchronous Time Division Multiplexing, 
for example, consider the transmission of messages 
from terminals to computer, each terminal is assigned 
a fixed time duration. After one user's time duration 
has elapsed, the channel is switched to another user. 
With synchronous operation, buffering is limited to 
one character per user line, and addressing is usually 
not required. The STDM technique, however, has 
certain disadvantages. As shown in Figure 1, it is 
inefficient in capacity and cost to permanently assign 
a segment of bandwidth that is utilized only for a 
portion of the time. A more flexible system that effi- 
ciently uses the transmission facility on an “instanta- 
neous time-shared" basis could be used instead. The 
objective would be to switch from one user to another 
user whenever the one user is idle, and to asynchro- 
nously time multiplex the data. With such an arrange- 
ment, each user would be granted access to the channel 
only when he has a message to transmit. This is known 
as an Asynchronous Time Division Multiplexing 
System (ATDM) . A segment of a typical ATDM data 
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California, 90024. 


stream is shown in Figure 2. The crucial attributes of 
such a multiplexing technique are: 

1. An address is required for each transmitted 
message, and 

2. Buffering is required to handle the random 
message arrivals.** 

If the buffer is empty during a transmission interval, 
the channel will be idle for this interval. 

An operating example of an ATDM system for 
analog speech is the “Time Assignment Speech Inter- 
polation" (TASI) system used by the Bell System on 
the Atlantic Ocean Cable. 1 Using TASI, the effective 
transmission capacity has been doubled and the system 
operates with a negligible (with respect to voice trans- 
mission) overflow probability of about 0.5 percent, 
even without buffering. 

The feasibility of the ATDM system depends on : 
(1) An acceptably low overflow probability— of the 
same or lower order of magnitude as the line error 
rate — that can be achieved by a reasonable buffer 
size, and (2) an acceptable expected message queuing 
delay due to buffering. To estimate these parameters, 
analyses of the statistical behavior of the buffer are 
presented below. The user-to-computer traffic is in 


** There may be other reasons for providing buffering such as: 
tolerating momentary loss of signals (e.g., fading), momentary 
interruptions of data flow, permitting error control on the line, 
etc. Under these conditions, the buffer should be designed to 
satisfy also the above specific requirements. 
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Figure 3 — Asynchronous time division multiplexing 
system for time-sharing computer communications 


ASYNCHRONOUS T^EXTRA BANDWIDTH AVAIL - 

TIME-DIVISION MULTIPLEXING ABLE FOR ADOITIONAL USERS 

. F I RST T SECOND J 
r— CYCLED" CYCLE 

DATA FROM USER "A" AT THE i TH CYCLE 

Figure 1 — Time-division multiplexing 



la) USER -TO- COMPUTER DATA STRUCTURE 



lb) COMPUTER -TO -USER DATA STRUCTURE 
ADS ADDRESS 

E END OF MESSAGE 

MESSAGE 

Figure 2 — Asjmchronous time division multiplexing 
data stream j 

units characters, while the computer- to-user traffic is 
in units strings of characters which we shall call bursts. 
The length of the bursts are different from one to 
another and are treated as random variables. Because 
of the asymmetrical nature of the traffic characteristics, 
the statistical behavior of the buffer in the user-to- 
computer multiplexer and the computer-to-user multi- 
plexer are quite different and, therefore, are treated 
separately. An example is given to illustrate the multi- 
plexer design in a time-shared computer-communi- 
cations system that employs ATDM technique. 

Analysis of buffer behavior 

User-to-computer buffer 

An ATDM system consists of a buffer, encoding/ 
decoding circuit, and a switching circuit (in the case of 
multiple multiplexed lines) as shown in Figure 3. For 
the analysis of the statistical behavior of user-to- 
computer buffer, the character (fixed length) arrivals 


from the sources to the buffer are assumed to be gener- 
ated from a renewal counting process; that is, the 
character interarrival times are independent and 
identically distributed. Since the line transmits with 
constant speed, the time it takes to transmit each 
fixed length character (service time), 1/m, is assumed 
to be constant. For reliability and simplicity in data 
transmission, synchronous transmission is assumed. 
The data are taken out synchronously from the buffer 
for transmission at each discrete clock time. The data 
arriving at the buffer during the periods between clock 
times have to wait to begin transmission at the begin- 
ning of the next clock time, even if the transmission 
facility is idle at the time of arrival. In queuing theory 
terminology, the above system implies there is a gate 
between the server and waiting room which ii3 opened 
at fixed intervals. Thus we shall analyze the queuing 
model f with finite buffer size (waiting line) and synchro- 
nous multiple transmission channels (servers). Powell 
and Avi-Itzhak 2 analyzed a, similar queuing model 
with an unlimited waiting line. Birdsall, 8 and later 
Dor 4 analyzed a queuing model with limited waiting 
room but with a single server. In here, the model is 
generalized to accommodate multiple servers with 
limited waiting room. 

To establish the set of state equations for analysis 
of a buffer with a size of N characters and c servers, 
we assume that the system has reached its equilibrium. 
Let p* be the probability that there are exactly n 
characters in the system (in the buffer and in service) 
at the end of a service time, and a c be the probability 

t The results derived from this study can also be used as a con- 
servative estimate (upper bound) for the case in which the lines 
are permitted to transmit the characters arrived during the 
service interval. The estimate yields better approximation for 
the heavy than light traffic intensity case. Because under heavy 
traffic case, the lines are usually all busy and the characters that 
arrive during the service interval have to wait and cannot be 
serviced during the service interval. The maximum over design in 
a buffer system with c transmission lines that permits to transmit 
the characters arrived during sendee interval is c characters. 
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there are no more than c characters in the system at 
that time, i.e., 

= E Pi -(f) 

Without loss of generality, we can let the service 
interval equal to unity. We shall express the proba- 
bility of number of characters present in the buffer at 
the end of the unit service time interval (left side of 
equation (2)) in terms of the probability of the number 
present in the system at the beginning of the interval 
(right side of equation (2)), multiplied by the proba- 
bility of a given number of characters arriving during 
the service interval. As this can occur in different 
combinations, we add the probabilities. With synchro- 
nous transmission, all characters in service would 
finish their service and leave this system at the end of 
a service interval. 

Thus in a unit service interval of time, we have 


buffer is vacant, if no more than c characters are in 
transmission at the beginning of the interval, and no 
arrivals occur during the interval. The second equation 
describes the case in which one character is in the buffer 
if no more than c characters are in transmission at the 
beginning and one arrives during the service time 
interval; or there are c + 1 in the buffer at the be- 
ginning and no character arrives during the service 
interval, etc. In the numerical computation carried 
out in this paper, we assume the character arrivals 
are generated from a Poisson process; that is, ir n = 
exp( — A M )X2/n !, where \ u is the average character ar- 
rival rate to the user-to-computer buffer (offered load) 
from the m independent users. Since the buffer has a 
finite size of N, p i>N = 0. Thus, when a character 
arrives and finds the buffer is full, an overflow will 
result. Therefore, the average character departure rate 
from the user-to-computer buffer (carried load), a u is 
less than the offered load from the users X w . The carried 
load can be computed from the buffer busy period 


Po = a c 7To 

Pl = a c 7Tl + Pc+1?T0 

p 2 = a c 7T2 + Pc+lTTl + Pc+2^0 


c — 1 N 

ctu = EE i-Pi +c EE P< ( 4 ) 

{=>0 i—c 


P« 


a c 7T n + p c+ i7T n _i + • ■ • + Pu-fw-lTTi 

+ p c+n ?ro, for n < N — c 


( 2 ) 


P » = a c 7T rt + Pc+l^n-1 + * * ’ 

+ pAr_i7r n -|-i_-(tf-c) + Pisr7r n -(Ar-c) 

for N 1 > n > N — c 

N 

E Pf = 1 

i— 0 | 


Due to limited buffer size, 

Pi>v — 0 (3) 


The overflow probability of the user-to-computer 
buffer, the expected fraction of total number of charac- 
ters rejected by the buffer, is then equal to 


Po/ = 


offered load— carried load 
offered load 


1 (S) 


The traffic intensity from user-to-computer, p Ut 
measures the degree of congestion and indicates the 
impact of a traffic stream upon the service streams. 
It is defined as 


Pu — X w / C/A (fi) 

Channel (server) utilization, ??, measures the fraction 
of time that the lines are busy. It can be expressed as 


Where 


7 r n = probability of n characters originating from a 
renewal counting process during a service 
interval 

N = buffer length in characters 
c = number of transmission lines 


The first equation describes the case in which the 


Yj — (1 Po/)Xu/C/A — r CX-uf CjU ^ Pu 00 

Since physically it is impossible for the transmission 
lines to be more than 100 percent busy, the utilization 
is limited to a numerical value less than unity. In the 
no-loss case (unlimited buffer size), P c / = 0, then y 
= P- 

The time average queuing length in the user-to- 
computer buffer, L w , is equal to 
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N 

L u = S (i — c)p* + X m /2 characters 

for N > c. (8) 

The first term in Equation (8) isjthe expected number 
of characters in the system at the beginning of a service 
interval. Since the characters bould not leave the 
system during the service interval, we add the time 
average number of character arrival (for Poisson 
arrivals) during the service interval which is X„/2. The 
expected (time average) queuing delay of each charac- 
ter at the user-to-computer buffer due to buffering, 
D w , can be evaluated by using Little's 6 result. We have 

D u = L W /(X W (1 — P 0/ )) service times (9) 

For the single server case, that is, c = 1, the set of 
state equations (2) becomes aii imbedded Markov 
Chain, and can be solved iteratively to obtain the 
state probabilities as shown in References 3 and 4. 
For the multiple server case, however, the multiple 
dependence on the various statbs prevents us from 
using the iterative techniques for solution. Thus, the 
set of state probabilities, p/s, must be solved from the 
set of linear matrix equations (2) . The overflow proba- 
bility, queuing delay, and queue length are then com- 
puted from the p/s via Equations 4, 5, 8 and 9. 

The size of the matrix (Equation 2) corresponds to 
the buffer length. The matrix equation was solved by 
the Gauss elimination method. 6 For purposes of ac- 
curacy, double precision was used in all phases of the 
computation. From the character arrival rate, \ u , the 
coefficient values can be computed from (2) and they 
are stored in the computer program. Due to the limi- 
tation of the computer word size* double precision on 
IBM 360/65 provides 15-digit accuracy. Therefore, 
when the coefficient value is less than 10~ 16 , it is set 
equal to zero. The computation time required to solve 
this type of system equation is largely dependent on 
its size. For a 10X10 matrix the| computation time is 
about 0.8 seconds, while a 50X50 matrix equation 
takes about 1.67 minutes. 

Numerical results are presented in Figures 4, 5 and 
6. These results reveal the relationships among the 
overflow probabilities, number of transmission lines 
used, traffic intensities, and buffer sizes. 

Computer-to-user buffer 

In a previous section, the buffer behavior has been 
analyzed for a finite queue with multiple server, 
Poisson arrivals, and constant service time, which 
corresponds to the users-to-coimputer traffic. The 



Figure 4 — Overflow probability vs buffer size 


o 



Figure 5 — Expected queuing delay vs buffer size 
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Figure 6 — Expected queuing delay vs traffic intensit y 


computer-to-user traffic, however, is quite different 
from the users-to-computer traffic. The central pro- 
cessor of a time-sharing computer sequentially per- 
forms fractions of each user’s job and the output 
traffic to the users are strings of characters which we 
shall call bursts. The length of the bursts are different 
from one to another and are treated as random vari- 
ables. It is assumed that the internal processing speed 
of the computer is very fast as compared to the line 
transmission speed. Further, it is assumed that the 
various processing tasks generated by the user-com- 
puter interactions are independent from one user to 
another and have exponential interarrival times for 
a given user. In ATDM operation with these assump- 
tions, the arrivals of bursts at the common output 
transmission buffer for the group of users are approxi- 
mated as random. In this section, we shall analyze this 
buffer behavior under the assumptions of a finite queue, 
single server with batch (burst) arrivals, and constant 
service time. 

Using the burst length and traffic intensity as param- 
eters, we would like to find the relationships among 
the overflow probabilities, expected burst delays due 
to buffering, and buffer sizes. 

Let us consider the case that the burst length, L 
is geometrically distributed with mean, / — 1/0; and 
the number of bursts arrived during a unit service 
interval (time to transmit a character from the multi- 
plexed line), N , is Poisson distributed with mean, X c 
bursts/service time. The distributions of L and N are 
as follows: 


f l (() = 0(1 -0K" 1 1,2, (10 ^ 

f*(n) = exp( — X c )\g/n! n = 0, 1, 2, • • • (11) 

The total number of characters that arrived during 
the time to transmit a character on the multiplexed 
line is a random sum, S* , and is equal to 

N 

Siv — 2 L< (12) 

0 


where Li, a random variable distributed as (10), is 
the number of characters contained in the ith arriving 
burst. N, a random variable distributed as (11), is 
the total number of bursts arriving during the unit 
service interval. For simplicity in notation, we let 
S = S;v* 

The characteristic function of S, <f> 8 ( u), can be ex- 
pressed in terms of the characteristic function of L< 
0^(u),and X c . 

4>a( u) = exp[ — X c + X c ^l(u)] (13) 

Since the burst lengths are geometrically distributed 
the characteristic function of L is 

0l(u) = 0-exp(iu)/^l — (1 — 0)exp(iu)^ (14) 

where i = \/ —1. Substituting (14) into (13), then 

<t> s ( u) = exp[— X c + X c -0-exp(iu)/ 

(1 — (1 — 0)exp(iu))] (15) 


From (15), it can be shown that the probability 
density of j characters arriving during a unit service 
interval, f(S = j) = fy, is a compound Poisson distri- 
bution as shown in (16) 


f . - f(S = j) = 



) 


(MO* 


(1 — 6)>~ k exp(— Xj/k! 
j I = 1, 2, • • • 
exp(— \ c ) j = 0 

(16) 


The expected value of S is given by E[S] = E[L]E[iV] 
= X/0, and the variance of S is given by 


Var[S] = X(2 - 0)/0 2 


(17) 
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The time required to compute the probability density 
function of S, fy, from (16) is dependent on the size of j. 
For large j (e.g., j > 1000), the computation time 
could be very large and prohibitive. A convenient and 
less time consuming way to compute fy is from </> fi (u) by 
using the Fast Fourier Transform 7 inversion method as 
follows: 

M 

fy = 22 <#>s(r)exp[— 27rirj/M] 

r=*l 

j = 0, 1,2, ...,M -1 (18) 


The average character departure rate from the buffer 
(carried load), a C) is less than the average character 
arrival rate to the buffer (offered load), /3 = X c /0, 
from the computer. The carried load can be computed 
from the probability that the buffer is idle, 

a c = 1 — p 0 (22) 

The overflow probability of the buffer with burst 
input, the expected fraction of total number of charac- 
ters rejected by the buffer, is equal to 


where 

r = 27ru/M 

M total number of input points to represent 
<t> s (r) = total number of output values of fy. 

In order to accurately determine <j> 8 ( r), it is computed 
with double precision on the IBM 360/65. Further, we 
would like to use as many points; as possible to repre- 
sent (t> s ( r); that is, we would like to make M as large 
as possible. Because of the word length limitation of 
the computer, double precision provides 15-digit ac- 
curacy. Therefore, when fy < 10~ 15 , it is set equal to 
zero. M is selected such that fy >M < 10” 16 . The M’s 
are different for different values of X c and?. 

The following is the set of state equations for a 
buffer size of N characters with batch renewal arrivals, 
single server, and constant output rate. 

n 

P* = TToPn+1 + 22 7T»-»+lP* + TTnPo 
1 


or 


Pn-fl “ | Pn ^ ^ ^T/i— i+lP i TTnPo j 

5 T 0 L »-l J 

n = 0, 1, 2, • • • , N — 1 


(19) 


X P< 


= 1 


and 


P*> JV “ 0 


( 20 ) 


( 21 ) 




offered load-carried load 
offered load 


1 - ae/P (23) 


The traffic intensity from computer-to-user is 

Pc = P/p = X C /(0 M ) = Xj/j u (24) 

The set of state Equations (19) is an imbedded 
Markov Chain. In the following numerical compu- 
tations, we shall assume that the character arrivals 
are generated from a compound Poisson process, i.e., 
t n = f i. The state probabilities can be solved iterative- 
ly and expressed in terms of p c . From (20), we can 
find the value of p 0 . Thus we find all the state proba- 
bilities. The overflow probabilities for various burst 
lengths can then be computed from (23) . These results 
are presented in Figure 7 which provides the relation- 
ships (at P f 0 f = 10 -6 ) between burst lengths and buffer 
sizes for selected traffic intensities. 

In the above analysis, we have treated each charac- 
ter as a unit. However, in computing the expected 
burst delay, D c , due to buffering, we should treat each 
burst as a unit. The service time is now the time re- 
quired to transmit the entire burst. For a line with 



The above equations are reduced from Equation (2) by 
letting c = 1. 


Figure 7 — Buffer length vs average burst length, 

K = io-« 
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constant transmission rate, the service time distribution 
is the same as the burst length distribution except by a 
constant transmission rate factor. When overflow 
probability is very small, for example, P' 0/ = 10” 6 , 
then D c can be approximated by the expected burst 
delay of the infinite waiting room with Poisson Ar- 
rivals and single server with geometric service time, 
M/G/l, model. 8 - 9 Hence 

D = XE ( L2 ) = x *(2 ~ 0) 

2(1 - p ) 2(6 - \ c ) 

character-holding times (25) 

where E(L 2 ) = second moment of burst length, L. The 
delays are computed from (25) for selected traffic 
intensities and burst lengths. Their results are por- 
trayed in Figure 8 . 



Figure 8 — Traffic intensity vs expected burst queuing 
delay 


Discussion of results 

We shall first discuss the user-to-computer buffer 
behavior. Figure 4 portrays the relationships between 
overflow probabilities and buffer size for selected 
traffic intensities and selected numbers of servers. The 
curves for two-, three-, and four-servers lie in the 
region between the single and the five-server curves. 
For a given traffic intensity, the overflow probability 
decreases exponentially with buffer size. For a typical 
traffic intensity of 0.8 ,a buffer of twenty-eight charac- 
ter length will achieve an overflow probability in the 
order of lO^ 6 . A larger buffer size is needed for p w >0.8 
in order to achieve the same degree of buffer perfor- 
mance. For a given p, the queuing delay increases as 
the overflow probability decreases (or the buffer size 
increases). When the overflow probability is less than 
10- 4 (for p u = 0.8, this overflow probability corresponds 
to a buffer size of about eighteen characters), the delay 
increment with buffer length becomes negligible and 
the delay can be approximated as independent of buffer 
size as shown in Figure 5. 

For the data transmissions in time-sharing systems, 
the buffer overflow probability should be somewhat 
less than the line error rate. For currently available 
lines, the error rate is about 10"' B . Therefore from 
Figure 5, we know that the queuing delay range of 
interest is almost independent of the buffer length. 
Figure 6 describes the queuing delays (at overflow 
probability = 10” 6 ) for various traffic intensities. The 
queuing delay increases exponentially with p. For a 
given p, the queuing delay decreases with the increase of 
number of servers. Figures 4 and 6 agree with our 
intuition that whenever multiple servers are needed, 
it is always advantageous to use a common buffer 
rather than using several single lines with separate 
buffers. 

Next we shall discuss the computer-to-user buffer 
behavior. The overflow probability depends upon the 
buffer size, the traffic intensity, and expected burst 
length. For a given average buffer length, the overflow 
probability increases as the traffic intensity increases. 
For a given traffic intensity, and a desired buffer 
overflow probability, the required buffer size increases 
as the average burst length increases. Figure 7 provides 
the relationships between the average burst length 
and required buffer size to achieve an overflow proba- 
bility of lO -6 for selected traffic intensities . 

When the average burst length equals unity, then 
the result reduces to the case of Poisson arrivals, 
single server and constant service time as had been 
analyzed. 8 ' 4 For a given traffic intensity, required 
buffer size for average burst lengths > 1), N/, to 
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achieve the same degree of overflow probability is 
much greater than that for unity burst length, Ni. In 
general, > /XNi. As / increases, the difference be- 
tween and ^XNi increases. For example, for p c 
= .8, 7 = 1, the required buffer size to achieve P' 0 / 
— 10 -6 is Ni = 28 characters. When^ = 4, then from 
Figure 7, N 4 = 212 > 4X28 = 112 characters. In 
the same manner, if l = 20, N 20 = 1200 > 20X28 
= 580 characters. This is due to the fact that the 
variance of S is proportional to 1 as shown in (17). 
Figure 8 portrays the relationship between expected 
burst queuing delay and traffic intensity for selected 
expected burst lengths. For a given expected burst 
length, the expected queuing delay increases as traffic 
intensity increases; for a given traffic intensity, the 
expected queuing delay increases; with burst length. 
These are important factors that affect the delay. 

Optimal design of multiplexing system 

Let us first consider the design of the user-to- 
computer multiplexer. Based on the user-to-computer 
traffic characteristics, the number of user terminals, 
maximum allowable queuing delay, and overflow 
probability, several different buffer system configu- 
rations might satisfy the desired requirements. Hence 
there are trade-offs among the number of transmission 
lines we might use, the transmission rates of the lines, 
and the buffer sizes. We would like to design the multi- 
plexing system whose total cost (transmission cost and 
buffer storage cost) is minimum. One way to proceed 
with this is first to select the set of possible multiplexing 
system configurations based on the queuing delay 
requirements from Figure 6. Based on the maximum 
allowable overflow probability, we can obtain the 
required buffer length for this set of possible multi- 
plexing system configurations. The optimal user-to- 
computer part of the multiplexing system can then be 
selected as that which minimizes the cost of the system. 

Next, we shall consider the optimizations of the 
computer-to-user multiplexer. Data collected from 
several operating time-sharing systems 10 revealed that 
the average number of characters sent by the computer 
to the group of users is an order of magnitude greater 
than the number of characters sent by the group of 
users to the computer. Thus, using high transmission 
rate line for computer output data would significar tly 
reduce in buffer size and the queuing delay due to 
buffering. Further, the change in the computer system 
such as changes in the scheduling algorithm 11-17 in the 
central processor can strongly influence the computer 
output traffic statistics, which will directly affect the 


buffer performance, and the design of the decoding 
system. 

In practice, we would like to design a system that 
has minimum total cost yet satisfies all the require- 
ments such as the inquiry-response delay, average 
holding time of each user, etc. Since the multiplexing 
system and the central processor intimately interact 
with each other, the multiplexing system should be 
treated as a subsystem of the time-shared computer 
system. The economical and performance optimization 
should be carried out jointly between the central pro- 
cessor and available communication facilities. 

Example 

Consider the design of a time-sharing system that 
consists of many remote terminals and that employs 
the ATDM technique with full duplex operation be- 
tween the terminals and the central processor. Measure- 
ments of the traffic characteristics from several oper- 
ating systems have revealed that the character inter- 
arrival time per user line can be approximated as 
exponentially distributed with mean about 0.5 seconds. 10 
Thus, the character arrivals can be treated as Poisson 
arrivals with a rate of 2 char/sec. A reasonable conserv- 
ative guess is that 50 percent of the transmitted 
information is sufficient for addressing and framing. 
Voice-grade private lines can easily transmit 240 char/ 
sec from users. Suppose this operating system consists 
of m = 48 terminals, all the terminals are assumed 
to be independent and have the same traffic character- 
istics. The buffer is designed such that the overflow 
probability is less than about 10 -6 . We shall use our 
model to determine the buffer size and the average 
queuing delay incurred by each character. 

The traffic intensity is p u = 1.5XmX„/cju u == 1.5 X 
48X2/240 = 0.6. To achieve the desired overflow 
probability, from Figure 4, the required buffer length 
is 14 characters. From Figure 6, the normalized queuing 
delay due to buffering is equal to 1.25 holding times. 
Since each holding time is equal to l/p u = 1/240 = 4.16 
millisecond, the waiting time of each character is 5.06 
milliseconds. Now suppose the number of terminals 
is increased from 48 to 96. In order that traffic intensity 
be less than unity, two transmission lines are required 
and the traffic intensity is still equal to 0.6. From 
Figure 5, the buffer length corresponding to the desired 
overflow probability for two transmission lines is 
about 14 characters. The waiting time is about 0.8 
holding times which is equal to 3.33 milliseconds. 
Although the difference between 5.06 milliseconds and 
3.33 milliseconds may not be detected by a user at a 
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terminal, a common buffer of the same size operating 
with two output lines can handle twice the number of 
input lines as with one output line. Thus, the common 
buffer approach permits handling a wide range of 
traffic without substantial variation in buffer size. 

Next, we shall consider the buffer design problem 
that employs the ATDM technique to transmit data 
from central processor to remote terminals. The traffic 
statistics as well as the message length are different 
from that of the users. The burst interarrival time 10 
can be approximated as exponentially distributed 
with a mean of 2.84 seconds. Thus, the bursts can be 
approximated as Poisson arrivals with a rate of X c = 
0.35 bursts/sec. Further, data collected in the same 
study indicate that the burst length can be approxi- 
mated as geometrically distributed with a mean of 2 
= 20 characters. Suppose we use a wideband trans- 
mission line that transmits 480 char/sec to provide 
communications from the central processor to 48 ter- 
minals. Assuming 20 percent of the transmitted 
information is used for addressing and framing, then 
the traffic intensity, p c = jjl c & 0.84. To achieve 

an overflow probability of 10“®, from Figure 7, we 
find that the required buffer size is 1,400 characters. 
From Figure 8, the expected queuing delay for each 
burst is 85 character-holding times, or 85/480 = 0.176 
seconds. 

Suppose now we changed our transmission rate from 
480 to 960 char/sec ; then the traffic intensity p c ~ 0.42. 
The corresponding required buffer size in order to 
achieve an overflow probability of 10 H * is 480 charac- 
ters, and the delay is 15 character-holding times or 
16 milliseconds. Thus, these results also provide in- 
sight regarding the trade-off between transmission 
costs and storage costs. 

The above example is based on the output traffic 
characteristics of a specfic computer scheduling algo- 
rithm. As the output traffic statistics changes with 
different scheduling algorithms, the buffer performance 
in the multiplexing system is affected. To design an 
optimal system, we should jointly optimize the sched- 
uling algorithm and the multiplexing system such that 
yield minimum total cost and also meet the required 
system performance such as maximum allowable 
inquiry-response delay, desired overflow probability,, 
etc. 

CONCLUSIONS 

Queuing analyses indicate that for an allowable over- 
flow probability and queuing delay, moderate buffer 
sizes can be achieved for asynchronous time division 
multiplexing for time-sharing computer systems. 


Further, when multiple transmission lines are required, 
better buffer performance will be achieved by using a 
common buffer rather than by using separate ones. 

Because of the asymmetric nature of the traffic 
characteristics of user-to-computer transmission versus 
computer-to-user transmission, a much larger buffer 
is required for the computer-to-user multiplexer to 
handle the larger volume of data generated by the 
central processor. 

The multiplexing system and the central processor 
in a time-shared environment directly interact with 
each other. To design an optimal operating system, 
we should jointly optimize the central processor and 
the multiplexing system (for example, the interaction 
between scheduling algorithm and buffer performance) 
to obtain a minimum cost system that meets the system 
performance requirements. It is apparent that closer 
coordination between the computer and communi- 
cation system designs would be fruitful in terms of 
economics and technological improvements to the 
overall system design. 
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The involved generation — Computing 
people and the disadvantaged 

by DAVID B. MAYER 

IBM Systems Development Division 
White Plains, New York 


INTRODUCTION 

Motivated computer professionals all over the United 
States have undertaken a most special and extraor- 
dinary task: they are involving themselves in every 
way possible in the training of disadvantaged and edu- 
cationally-deficited men and women from the so-called 
ghetto and poverty areas of the country* They are 
exhibiting a special and wonderful tension which im- 
pels them to appear at that interface between their own 
computing community and those underprivileged who 
wish to enter it. 

As Chairman of the new ACM Committee On Com- 
puting And The Disadvantaged (ACM — CCD) I have 
been privileged to visit or directly participate in ten 
projects in New York City, Boston, Los Angeles, San 
Francisco, Sacramento, St. Louis, and Philadelphia. 
From them can be drawn some broad briish pictures 
of such projects, some of their special problems, and 
their relative probabilities of success. 

The disadvantaged — Who are they ? 

The term “disadvantaged” was originally coined in 
connection with educational grants from the govern- 
ment, for potentially very bright youths from proverty 
backgrounds for experiments in educational techniques 
programs. Since that time, it has broadened to in- 
clude all those who are educationally-deficited (and 
with minimal hope of retrieval of those years they are 
behind), including those from both poor white and 
non- white communities. 


Typically, computer projects have undertaken to 
train some of the disadvantaged either as operators or 
programmers. Generally the participants have been 
characterized as follows : 

• 19-23 years old 

dropped out of ninth or tenth grade 

• are black or brown 

• are two to three years behind their white counter- 
parts who are at the same grade level in terms of 
tested comprehension 

• about two- thirds male 

• have a job of some kind, but are underemployed 
apparently by reason of race or language 

• come from a poverty-stricken area, often an ur- 
ban “ghetto” 

• have police records in about one-khird the cases 

• evidently have some motivation to better them- 
selves 

• have children or heavy “family” responsibilities 

• on aptitude tests score over the complete range 
from high to low 

More particularly though, a review of some other 
statistics may help us to orient ourselves: 17 

For Negroes in the 25-34 year old age bracket : 
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• 47.0 percent dropped out before graduation from 
high school 

• 45.6 percent completed high school 

• 7.4 percent completed high school and college 

• A Negro sixth grader was 2fl/2 grade levels be- 
hind his white counterpart in general scholastic 
achievement 

• A Negro ninth grader was three grade levels behind 
his white counterpart 

This three-year deficit picture persists, through 12th 
grade and graduation, in general. | 

Remediation, restructuring, jand ‘relevancy’ 

What does this mean to the computer training course, 
or to the jobs which people with such backgrounds can 
undertake? 

It means some tutoring in the technical concepts 
during the computer operator’s or other courses. It 
almost ceitainly will mean lengthening the course de- 
liberately. Currently computer operator and program- 
ming (usually Cobol, by the way) courses run two to 
five times longer than the equivalent course given in 
the regular industrial milieu. 

It means teaching only ‘relevant’ material, only 
the guts of content, only that which is directly ap- 
plicable to that job waiting at the end of the course: 
ergo, no frills. 

It means employers will have | to restructure some 
jobs, in smaller, less complex, carefully detailed clus- 
ters, so that a rather straight-forward set of behaviors 
can be carried out by new employees. 

It is possible to take small top level segments of the 
disadvantaged populace and tra|in them directly in 
computer tasks without remediation. But in general if 
we want to really dig into the American dilemmas of 
today, remedial training will be needed for any broad 
training program developed to tying students up, to 
the level of comprehension needetj to understand some 
of the computer concepts of our more abstruse com- 
puter texts. 

Trade-offs in training 

There is* then, a kind of balance of course content 
requirements versus several variables — principally time 
— which one can invest to obtain effective training and 
eventual on-the-job performance results. 

For example, most disadvantaged projects teaching 
key punch operators required that trainees be able to 


type 20 to 40 words per minute prior to entering key 
punch classes. 2 ' 34 Where the normal keypunch class is 
five days, in projects for the disadvantaged they run 
15 to 20 days. 

A project choosing high school graduates can train 
computer operators quite effectively, and include 
theoretical material on operating systems, program- 
ming techniques, the internal supervisor/program 
coupling within the computer, enough so as to allow 
an operator to make some reasoned judgements in 
error situations. This is obtained through trade-offs 
such as (a) lengthening the course, or (b) intensifying 
the hands-on expereince. This probably gives the dis- 
advantaged person who graduates one of the finest 
running starts in ‘operations’ in the country. (N. B. 
Particularly true of the Urban League/IBM/Bank of 
America project in Los Angeles. 4 

The placement problem and the assumed job market 

Most projects have been located in large urban, 
highly computerized geographical areas; groups in 
the planning stages have typically looked about them- 
selves and faced the combinatorial possibilities of 
probable jobs available and probable people they were 
hoping to train. Almost invariably they concluded that 
three possible combinations were feasible : 

. key punch operator 

• computer operator, either as a trainee handling 
tapes and discs and peripherals primarily, or as 
a trainee console operator. 

• a trainee Cobol programmer 

Generally rejected for training were job descriptions 
which involved: 

• Fortran or basic language programmers 

• pure EAM or “unit record equipment” operators; 
however, this was sometimes appended to the 
computer operator trainee position description 

• tape librarians, dispatchers, I/O clerks, and the 
like. 

Most projects made only a cursory pass at the actual 
placement planning question and generally assumed 
that any graduates they offered the marketplace would 
be snapped up with only a modicum of effort to find 
interviews. Inevitably, halfway through the training 
when efforts turned toward placement interviews 
there were some rather rude awakenings to several 
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facts; the students’ color, language, and prior records 
were obstacles that required active selling to over- 
come. More often than not there was a mad scramble 
toward the end of the training period to find employers 
willing and able to hire trainee computer operators 
from the poverty sector of our poplulation. Only heroic 
efforts upon the part of placement committees would 
slowly find openings for interviews, much less pre- 
committed employment slots. 

Hence, if there were one piece of advice this author 
wpuld give it would be; plan your placement process 
first ; involve would-be employers at the earliest plan- 
ning stages to test the marketplace, to involve them in 
the training stages, to be interested in the graduates, 
and to assure jobs at the end of the course. It is almost 
axiomatic that if you should fail to place your ‘dis- 
advantaged’ trainee within a very few weeks of his 
graduation you may have lost him or her forever and 
all the training investment will have been for naught. 

The computer operator — What is he ? 

In order to converge upon concrete results and make 
some comparisons only the training surrounding the 
job of Trainee Computer Operator will be described. 

Let us consider three different, but related, aspects 
of the Computer Operator position description; 

• EAM or “unit record equipment” knowledge and/ 
or ability; 

• pure computer operating , highly structured , highly 
practical , based on detailed specified stimuli and 
response patterns. 

• an “ understanding ” computer operator who has 
sufficient theoretical knowledge about operating 
systems computer operator, who has sufficient 
theoretical knowledge about operating systems to 
solve unexpected error situations, so as not to 
abort, but successfully run a job. 

In a typical job description, the Trainee Computer 
Operator works under close supervision, performs the 
simpler operations on peripheral devices and on the 
console, expedites the data in and out of the system 
and the installation, and is generally a careful intel- 
ligent follower. He is usually expected to have two 
years of college (possibly an AA degree) or several 
years of tabulating machine (EAM) experience or 
a 200-hour hands-on computer operating course. 16 

The Journeyman Computer Operator is expected 
to do more; based upon six or more months of actual 
Trainee experience, he checks input and output for 


general results, analyzes stops and takes corrective 
action, and runs test programs. He is also required to 
know the principles of operations, basic elements of 
programming, follow directions carefully and analyze 
data, and perform arithmetic computations. 

It is the author’s contention that the EAM tasks 
and training are frills and basically obsolete and should 
not be taught (excepting a little keypunching for error 
corrections); that the second or “structured, practical” 
job description is the one for minimal entry level jobs 
for the disadvantaged; and that the third description 
adds a requirement for “theoretical understanding” 
for computer operator train* ng projects. This litter 
requirement is significantly high in terms of language 
comprehension and acts as a deterrent to large sectors 
of the disadvantaged population trying to take advan- 
tage of the training. 

It is interesting to note that in almost every case of 
the disadvantaged training projects with which the 
author is familiar, nowhere nearly such stiff conditions 
are placed either upon the students for entry into the 
computer operator course, nor upon them for eventual 
hire. And there is every indication they can perform 
successfully upon the job with considerably less strin- 
gent qualifications. 2 ,12 

The author therefore urges that to be able to develop 
the truly disadvantaged, educationally-deficited per- 
son (a dropout from as low as the ninth grade) com- 
puter installations should re-structure their basic 
computer operator job specifications and training 
projects their content to reflect the entry-level re- 
quirements for the “practical, structured” computer 
operator trainee. This would give gainful employment 
of a meaningful type to many more people in the total 
community, particularly the disadvantaged. 

Computer operator curricula 

In this section are described two examples of opera- 
tor curricula to exhibit the basic approaches, typical 
content, and a preliminary view of some of the training 
techniques employed. (Treated more fully in another 
section.) 

In the Mitre Corporation’s fully in-house, fully 
funded (internally) on-the-job (OJT) format 9 students 
are paid at regular industrial rates. Remedial training 
in basic language skills and mathematics takes up 
most training hours daily, for the first few of the 26 
weeks total. Gradually it is replaced by computer 
operations training on both the IBM 7030 (Stretch) 
and the IBM 360/30 and 360/40 systems; and of 
course gradually the students work out on the line. 
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Instructors are internal, paid, staff members; four 
students started and three finished successfully. They 
were part of a 12-trainee Mitre project for clerks, 
operators, and the like. 

The salient features of this Project’s outline include: 
(a) deliberately assigning their second shift Supervisor 
for nine months as Training Coordinator to prepare the 
technical curriculum, instruct, supervise the OJT 
aspects, and coordinate with the remedial training; (b) 
giving all first shift personnel a stake in the outcome, and 
include them in the evaluation process; (c) providing 
separate remedial training on a descending scale 
concurrent with increasing line operations training 
and expereince. A full Outline is available from the 
ACM. *.** 

The second basic approach and the one most often 
used, was the external, separate! training program; it 
is typified by the CPDA project in New York City. 1 
Using a “self-selection” process 12 75 prospective stu- 
dents went through an ‘orientation’ to computeroperat- 
ing, and then 48 volunteered for actual training. Thir- 
ty-two stayed with it, 27 graduated on the first round, 
three more were tutored to completion, and 17 of the 
30 were placed as of this writing. 

The program, approach, and Syllabus Outline of 
CPDA are given in Figures 1 and la. 

EAM/Unit Record equipment training is not given 
in this course. Several of the courses did offer as much 
as a week’s equivalent of such training, on the basis 
of its relevance still in today’s! card-oriented input/ 
output part of the computing world. The Urban 
League/IBM/Bank of America project in Los Angeles, 
the Philadelphia ACM/Board Of Education project, 
and the St. Louis IBM/Board of Education are noted 
in particular . 4 ’ 10 - 11 The latter have a regular EAM 
course available in their vocational schools as well. 

Training techniques 

It would seem obvious that sofiie specialized training 
techniques would have to be employed to reach dis- 
advantaged or educationally-deficited people, and a 
few such techniques have been attempted in the com- 
puter training field. Such experiments should be carried 
out in a professional, measured, feedback atmosphere, 
but rarely has that been available. In the projects 
studied, probably the three aspects of training that 
have paid off the most are : 

. lengthening the courses by two to five times the 
average; 


1. ORIENTATION PROGRAM 

a. Registration 

.Welcome 75 prospective students 
.describe project 

.history and needs of the computer field 
. introduction to the computer 
. film on computer and operations. 

b. Introduction To Computer And Business En- 
vironment 

. devices used at an installation 

• business environment and general working 
conditions 

. employment prospects for the computer 
operator. 

o. Computer Installation Visit 

d. The Computer Operator And The Training 
Program 

. computer concepts 

. genera-1 responsibilities of the operator on the 
job 

• the operator’s relationship to the computer 
field 

. the training program 

.general discussion and individual counseling. 

Extensively, throughout these orientation sessions 
instructors and counsellors mingle with the students, 
interact on questions, and encourage ^//-selection into, 
or not-into, the actual training. 

Figure 1 — -A no-frills training syllabus. Computer 
Operations Training for the IBM S/360 Models 30 and 
40. Training consists of two parts. Since motivation 
and interest are prime factors in training com- 
pletion and job performance, a preliminary 
four-session orientation program was design©! 
to give the candidate a data base for 
making up his own mind, to enter or 
not-enter training. (After CPDA 1 - 13 ). 


.making the classes small; or alternatively as- 
signing two instructors per class to bring the 
pupil/teacher ratios down to as low as four-to-one; 

. allowing the class to teach itself, to a certain ex- 
tent, by teaming or as a full group. 

Interestingly enough, no project used any specialized 
audio-visual material (other than hands-on work with 
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the computer itself), most of them depending upon 
the standard available texts, programmed instruction 
books, or books of illustrations. 

Nevertheless, a kind of experiment did take place 
in the CPDA project, observed by the teachers, staff 
professional guidance counselors, and the students 
themselves. It involved trying three differing teaching 
techniques: 

1. The ‘classical* approach consists of a teacher 
lecturing to his students, with the teacher as 
focus for feedback (answers, discussions, ques- 
tions). This can be characterized as a ‘vertical^ 
organization of class structure. 

2. The ‘teams approach consisted of the teacher 
breaking up the group into five teams of three 
students each. This came about to solve the prob- 
lem of demonstrating the computer console and 


2. COMPUTER OPERATIONS TRAINING 
PROGRAM 

a. Course Structure 

• The training program will consist of both 
classroom sessions and computer room visits. 

• It is expected that approximately 30-45 stu- 
dents will complete the Orientation Program 
and enter the Operations Training Program. 

• There will be three sections, each with 10-15 
students. 

• Each section will have one primary teacher 
and one assistant teacher. 

. Classroom sessions will meet twice a week for 
two hours. 

• Computer room visits will be scheduled as 
required by the Syllabus and will be from two 
to three hours in length. 

b. Educational Material. The basic student text 
for the course will be: 

IBM System 360 Model 30 DOS System 
Operation Training Manual and Book of 
Illustrations (Student Text) ; Forms C20- 
1676-0, C20-1677-0). 

• Examples of I/O media will be available in 
the classroom for student familiarity with 

cards, tapes, disk packs, printer forms, car- 
riage control tapes, etc. 


SYLLABUS 

Section A (INTRODUCTION AND PERIPH- 
ERALS) covers: 

. Introduction to Input/Output Media 
. Computer Room Procedures 
. Computer Room Visits (hands-off demon- 
strations and hands-on practicums) 

. Operations of Peripheral Devices 
. General Review 

Section B (SOFTWARE INTERFACE) covers: 

• Introduction to “Operating Systems** 

. Control Information 

• Operator Interface With DOS (Disk Opera- 
ting System) 

• Computer Room Visits (hands-on practicum) 

• Stand-Alone Programs 

• Compatibility Modes — Emulation 
. Course Review 

Figure 1A — A no-frills training syllabus (con’td). Note 
the absence of E AM /unit record equipment training, 
and a maximum of immediately-applicable job 
knowledge given in a 54-hour course over a period 
of 2.5 months (20 sessions). (After CPDA 1 13 ). 

More detailed versions of this and other 
curricula, syllabi, and lesson plans are avail- 
able through the ACM Committee on 
Computing and The Disadvantaged 
(ACM-CCD). 


peripherals effectively. Having 15 students stand 
around in a large semi-circle proved boring and 
ineffectual; by placing the few most hep students 
with two of each of the others, he could in effect 
assign problems to teams to work out, and allow 
students to teach each other within teams. 
When competition rather than cooperation 
started to raise its head, the team members 
were rotated. In addition, the instructors, after 
giving the teams a problem, deliberately gave 
the impression that they would answer no 
further questions. After computer runs the 
whole class would hold a post-mortem. Instruc- 
tors also created unexpected problems, such as 
casually dereadying a printer, or flipping a 
tape into ‘file protect* mode without a file 
protect ring being inserted. Furthermore, stu- 
dents would be called upon at random at the 
beginning of a class to recapitulate the pre- 
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vious session's work and lessons, taking the 
instructors' place in essence. The remainder 
of the class usually jumped in to help the hap- 
less classmate — after waiting an appropriately 
gruesome few minutes. This Team' process, 
a combined ‘horizontal' and ‘vertical* class 
structure, and the random ‘instructor', all 
created an involvement within the class. It 
worked, and beautifully; in fact the class got 
ahead of the syllabus. 

3. The ‘fully horizontal 'or ‘group/workshop' ap- 
proach was occasionally attempted by the third 
pair of instructors. This normally involved the 
teacher bringing much of the material to the 
students' attention via lectures and some 
reading, but required that answers to problems 
and operations, come from the class as a group. 
In this particular instance, the structure worked 
fairly well, the class completed the material on 
schedule, but as an experiment it was relatively 
inconclusive. This was partly because the 
amount of lecture required, and individual help 
given was more than normally used in a true 
‘horizontal' workshop situation. That is, in this 
case, the technique never got a thorough 
workout. 

To summarize: the lecture technique worked fairly 
well on the brighter students, who expected it as a 
matter of previous exposure. Their class suffered the 
greatest number of dropouts, but not from the training 
technique used. 

The ‘teams' approach was very effective for both 
morale and learning. The class was able to cover a few 
items the others didn't. 

The ‘interactive' ‘fully-horizqntal' group organized 
as a workshop also finished, and reasonably well, cer- 
tainly comparable to the others! in content knowledge. 
But it is predicted that the stronger extension of that, 
the new Montessori/ workshop group involutional 
methods should give far better results for disadvantaged 
people, especially when the staff and the facilities can 
be structured properly. 16 

The Montessori environment requires careful guid- 
ance upon the part of the instructor, and a special quality 
of allowing the class to explore freely the alternative 
paths to answers. The instructor, in a sense, must be 
willing to subdue his usual position of center-focus role, 
become a part of the discussion, part of the group, al- 
most at their own level. Over a period of weeks, the 
group should become highly interactive, over the 
material, over technical and occasionally external life 


problems, and must be handled carefully. It has been 
used very effectively for teaching programmers and 
systems analysts (advantaged), 8 and it is strongly urged 
that the Montessori techniques and environment be 
attempted on the disadvantaged in all occupations. At 
least one project, the Sacramento/ACM Education 
Committee, 15 is planning to use it for a computer 
operators' course. 

Training and stability 

There is still one more important aspect of training 
which will aid a project immeasurably: the main- 
tenance of continuity of warm, stable teachers with 
whom the class can identify, and the assured conti- 
nuity of class sessions, the same physical facilities, 
knowing that the class is going to meet, and there will 
be a job waiting at the end of the course. Changing 
classrooms every few weeks and uncertainties of com- 
puter time when promised try the motivations of the 
students (and instructors) sorely, at times. Those 
situations which had good steady facilities, the same 
instructors throughout, (usually paid, and profes- 
sional at teaching itself) have the highest attendance 
and morale levels. Though these items go almost without 
saying, the proliferation of volunteer projects impels 
the author to issue this type of warning, for the sake of 
everyone involved, particularly the disadvantaged 
students; they have been through enough instability 
in life already. 

Curricula comparison: Methodology of evaluation 

Now that some specific curricula have been presented, 
we wish to set down some criteria and the method by 
which we will compare the various content and tech- 
niques; to do so we have prepared ourselves in the pre- 
ceding paragraphs with the job specifications, the 
required curriculum content, and the training ap- 
proaches. We consider some of the following points of 
comparison expanded in Figure 2. 

.Is content aimed at the structured, stimulus- 
response, practical type of course? 

• Are the results of the course immediately appli- 
cable to a job in a computer installation? 

• Does curriculum allow a “flexible tail'* so that 
graduate can go to work in an installation that 
has computers and operating systems other than 
the particular one taught? 

• Does the course lead into on-the-job training 
(OJT) easily? 
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FIGURE 2 — Comparison of curricula for computer operator training projects for disadvantaged peoples 
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Figure 2 — Comparison of curricula for computer operator 

training projects for disadvantaged peoples (Cont'd) 
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. How are the results of training measured? 

• How is the post-graduate performance measured? 

• How much time is given in classroom lecture? In 
hands-on experience per student? 

• Is there a “no failures” policy of teaching? 

• How much supportive remedial help is built into 
curriculum? 

• Are non-English students helped? 

• How many teachers per student? 

• Course content: was EAM/Unit Record equip- 
ment taught fully, or was only the keypunch 
taught (for computer room use)? Was operating 
system taught as button-pushing course? Or was 
a ‘theory of operating systems’ taught in addition, 
and the relationship between the resident appli- 
cations programs and the operating system con- 
cepts taught? 

With these criteria and questions in mind, a chart 
of the Yes/No/Comments type, (Figure 2) gives a pic- 
ture to the reader of the various projects, and their rel- 
ative strengths and weaknesses. 

Performance criteria — Some measures of the projects * 

If one were to attempt to measure the results of such 
training for the disadvantaged, one might look to the 
annual salaries accruing, of those who obtained jobs, 
versus the expenditure for the project. MWSDPS 16 
suggests that for approximately $14,000 they graduated 
28 students of all types (key punch and computer 
operators, and programmers), and placed 20 of them, 
for a job value of $98,000 annually, relieving the wel- 
fare rolls of eight people at the same time. By the same 
token, CPDA graduated and placed 17 computer 
operators, who now earn about $74,000 per year, all 
for something less than $1,000 cash, but using six 
teachers, eight guidance counselors, and about a dozen 
more in placement, tutorial, curriculum development, 
measurement, and find-a-eomputer chores, all 
volunteer - 12 

In another performance measure, it is evident that 
the more deeply into the social problems’ fabric a 
project wishes to penetrate, the more ‘underpinning’ 
or training support techniques one must invest energy 
and staff: these include tutoring, remedial training, 
high school equivalency aid, teaching English as a 
second language, and both vocational and life-guidance 
counseling. The volunteer projects expend 10 to 20 


supportive hours for every student class hour. 12 Pro- 
fessional staffs and funded projects tend to use a lower 
ratio of time (about one-to-one) but this is often bal- 
anced by a much larger expenditure of up to approxi- 
mately $1,000 per student, involving facilities and 
professionals. 

Other measures, such as performance on the job, 
have been followed up by projects too cursory to 
warrant reporting at this time. 

The ‘shadow programming aide ’ 

It seems to this writer that there are considerably 
more jobs available at a slightly higher level of com- 
plexity: that of a ‘shadow programming aide’. 27 Not 
just a coder, this person works in shadow relationship 
to a regular programmer, carrying out some of the 
more onerous details of programming, such as flow- 
charting from given coding, setting up debug runs, 
keypunching, expediting the debug process, or carrying 
out some of the detailed, but highly supervised coding. 
This kind of programming technician could very well 
have real upward mobility, dependent primarily on 
the trainee’s learning rate, general intelligence, in- 
terest, and proven ability. The national crush for pro- 
grammers is far greater currently than for operators; 
and in addition, the direct personal involvement of a 
disadvantaged person with a regular, stable, accepting 
programmer or two, would be one of the best stabilizing 
entries into the computing field. 

SUMMARY— PITFALLS AND SUGGESTIONS 

A number of points may r be abstracted from the 
foregoing, in addition to others not made in the main 
text: 

• The ‘typical project’ for training disadvantaged 
people is created by a highly motivated group of 
computer (and other) pro c essionals, with a desire 
to get involved, to do something. 

• The question each project must face is: just how 
deeply into the social problems it wants to delve : 
underemployed, unemployed, educationally-def- 
icited, hard-core unemployed, or whatever. 

• Most computer training projects aim at the 19- 
28 year olds, who are two or three years behind 
in their education, and won’t need very much 
remedial training to get them through a computer 
operator (or programming) course and into a mean- 
ingful job. 
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• It is a surprising note, but there seem to be rela- 
tively few trainee computer j operator courses in 
the country: mostly these have been given on the 
job (OJT), or from reference manuals supplied 
by the manufacturers; leastwise the author has 
detected very few. This points up a real econ omic 
advantage for the prospective employers: very 
little OJT need be spent to start these employees — 
they’re ready to go from Day One. For once, in 
this case at least, the disadvantaged are probably 
starting out ahead of their advantaged brethren. 

.The first psychological jolt for motivated whites 
is to discover they will not, most of them, be acting 
at the actual ‘interface’ between the disadvan- 
taged and the advantaged community. More of 
them will have to find psychic reward in support 
functions, such as finding employers with available 
jobs, writing curricula, obtaining free computer 
time, finding teachers and class space, obtaining 
funds. 

i 

.The second psychological jolt comes in discovering 
how much planning is to be done (or should have 
been done). 

.The third jolt is the marketplace: obtaining job 
slots requires persuading people outside your lit- 
tle narrow project, and it’s tough. Start very early 
to involve would-be employers, even at the plan- 
ning stages. 

• Courses, so far, have usually involved computer 
operator training on IBM machines, particularly 
the 360 series, in DOS or OS (Disk Operat- 
ing System or Operating System/360). Pro- 
gramming usually means Cobol, rather than basic 
languages. A way to teach a ‘pore’ curriculum ap- 
plicable to any operating system for any manu- 
facturer’s machine needs to be developed. Such 
a step has begun to take shape with the ACM- 
Philadelphia/Board of Education project, using 
the ‘flexible tail’ method for phasing to OJT. 
(Ref. 11). 

• The single greatest lack is funds. They are hard 
to get. Working with the local Board of Education 
to obtain government funds, or with local indus- 
try for underwriting expenses: privately, seems to 
be most effective. 

• Volunteer projects usually failed of their total 
objectives, but succeeded partially; but they tend 
also to peter out. Plans for real continuity must 
be built in. 


• One cannot stress too much the responsibility we 
have in changing the lives of would-be trainees. 

• The one thing training projects are prepared for 
is attrition; their expectations are that they will 
graduate 30 or 40 percent of the people who arrive 
at their door for the first formal class day. It 
works out that way, though some projects have 
gotten over 80 percent. 

.What is needed for the classes is a new, simple, 
straightforward text on operating a computer writ- 
ten in the language which the disadvantaged can 
understand. It may have to be aimed at the ninth 
grade level, for both the English and the commer- 
cial algebra comprehension. It could start with 
what they know already: numbers systems can 
come from the numbers game, and you could go 
on from there. Here’s a set of books someone could 
write, and the whole computing profession may 
benefit. 

• Serious consideration of new, less complex, job 
descriptions can be attained in the operations and 
the programming area. It is up to employers and 
the industry to develop them, in order to include 
a larger part of our population. 

• Finally, the author strongly urges that the com- 
puting community initiate a national broad-based 
organized effort, to develop jobs, regular training 
projects, and adaptations of training techniques 
for disadvantaged to enter the field. Two major 
proposals are now before the ACM Committee on 
Computing and the Disadvantaged: one from the 
Sacramento Chapter, one of the few ACM! chap- 
ters actively pursuing a generalized, funded-plus- 
volunteers approach; and the other from ACM’s 
Special Interest Group on Computer Personnel 
Research (SIGCPR) for a Massive Training 
Project, involving 50 cities and a 1000 students 
per year, and fully documented, measured re- 
search on selection, training, and performance 
within such projects and their graduates. 

All these are part of bringing students from the dark 
into the light, to help them enter the world of working 
and earning peoples, to stand on their own two feet 
with self-respect, and dignity: — in the words of the 
Prophet Micah, “that they shall sit every man under 
his vine and under his fig-tree, and none shall make 
him afraid.” 
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APPENDIX A 

Computer training projects for the disadvantaged 

— Brief characterizations and descriptions 

1. NEW YORK CITY: CPDA (Computer Pro- 
fessionals Development Association; Fall, 1968. Three 
parallel classes of 16 men each, Computer Operators 
only; completely volunteer teachers and staff. First 
pilot course completed November, 1968, hopefully 
leading in a funded and/or at least a teachers-paid 
project. Dr. Allen Morton, IBM/SRI, NYC, President. 

• Unique aspects: candidates self-selected based 
upon own interests after four-session “orienta- 
tion.” Three differing teaching techniques tried: 
classical lectures, two- and three-man teams, and 
semi “workshop” group involutional approach, 
yielding differing results. Strong professional guide 
ance counseling, heavy tutorial aid, and high school 
equivalency training available. 48 started, 32 
finished, 17 placed. Cost less than $1,000; jobs 
worth $74,000 per year. 

2. NEW YORK CITY: MWSDPS (Middle West 
Side Data Processing School) ; with Puerto Rican 
community group; a semi-funded project; started with 
ACM, then became industry-supported and volunteer. 
Summer, 1968. L. Barnett, Long Island University, 
Director. 

• Unique aspects: started with 164 off-the-street 
prospects through advertising, etc. Interviewed 
applicants for “logic capability”, “motivation” 
and language comprehension. Started 19 pro- 
grammers in COBOL, graduated 14, placed 9. 
Started 17 key punchers, graduated 8, placed 
7 ; started 12 computer operatoi’s in IBM 360/DOS, 
graduated 6, placed 4. Cost about $14,000, and jobs 
worth $108,000 annually. 

3. NEW YORK CITY: Harlem; a series of IBM 
keypunch operator courses organized by W. DeLegall, 
Columbia University Computing Center. Basic lesson 
learned: after first course taught basic typing from 
scratch as preliminary to keypunch training, subse- 
quent classes required candidates to have 40 wpm 
typing skill before entering. 

4. LOS ANGELES: ULDPTC (Urban League 
Data Processing Training Center), 7226 S. Figueroa 
Ave.; jointly sponsored by League, with professional 
IBM teachers and IBM donated equipment, in Bank 
of America donated building. Urban League both 
selected and placed candidates. This is the most profes- 


sional and thoroughly equipped computer field training 
project for. disadvantaged in the country. Supported 
completely by private/industrial funding. John O. 
Adams, (IBM), Training Director. 

• Unique aspects: three parallel courses in key- 
punching (12 people, four weeks), computer oper- 
ations (IBM 360/30, DOS, 12 students, six weeks), 
programming (COBOL, 12 students, 12 weeks). 
Full daytime staff and students; no student sti- 
pends. Has full IBM 360/30 with tapes, discs, 
printer, card reader/punch, dedicated to project 
only (i.e,, no production, only classes). Runs two 
years, about 250 students per year. Attrition rate 
very low (5 percent-20 percent) and placement 
rate very high. 

5. LOS ANGELES: Maywebb Sciences Corp. Orig- 
inally a Watts area volunteer project, spearheaded by 
Louis Webb. Has graduated programmers primarily. 
After two years is offering courses on regular paid 
“private EDP school” basis, and to private industry 
on government funds. 

6. LOS ANGELES: Operation Bootstrap : part of 
Watts area self-help in manufacturing and retail 
stores. Also started as key punching classes and pro- 
gramming; 47 enrolled in latter, plus remedial training. 
Founder: Louis Smith. 

7. SAN FRANCISCO: IBM; computer operator 
course (EAM + DOS) completely staffed and equipped 
by IBM's Branch Office Education Center; con- 
centrated on somewhat older group (average age = 
28.5 years, top is 45 years), a good portion of whom 
had seen tab equipment before. High percentage of 
police records and unemployed. Very successful em- 
ployment placement. Director, Philip Braverman, 
IBM San Francisco. 

8. SACRAMENTO: ACM CHAPTER, Education 
Committee. Specially planned computer operator 
training project, to begin classes in September 1969. 
Includes long, detailed market study of job specifica- 
tions to determine job placement availability and 
committed slots for graduates in state government 
and private industry. Detailed employer/project in- 
teraction; year-long careful planning in all phases 
systems analysis approach; use of PMS (Program 
Management System) for scheduling; professional, 
external structured measures of selection, evaluation 
of curricula and training performance, being woven 
in from the start; ‘Montessori/ workshop' group in- 
volutional training techniques; paid teachers; possible 



690 Fall Joint Computer Conference, 1969 

student stipends. Totally ACM-directed project, with 
industry /government cooperation. Organizer: Eliza- 
beth R. Alexander. 

9. BOSTON: MITRE Corp. j; totally funded by 
MITRE: the only fully in-house OJT in this survey; 
see section on Computer operator curricula. 

! 

10. ST. LOUIS: Board of Edycation and IBM ; 11 
students, eight weeks, 5.5 hours per day, in program- 
ming and operations; 8 out of 11 graduates are 
either computer operators or teleprocessing clerks; 
1968 Summer project, not being | repeated. Funded by 
Board of Education (Dr. Lawson, Treasurer) and 
staffed by regular IBM Systems Engineers, as pro- 
fessional teachers. Students received stipends, and 
were chosen from group of 60 nop planning to go on to 
college and with average grades. Former Director: Ron 
Dobies, IBM/DPD, Clayton, Missouri. 

11. PHILADELPHIA: ACM and Board of Educa- 
tion ; Delaware Valley Chapter ACM in cooperation 
with Thomas Edison High School counsellors; an 
after-school hours project (two days/week) for 20 
volunteer students; paid teachers, two “buddies” 
per student volunteer from ACM; runs about 25 weeks j 
includes EAM through full 360/DOS operations; 
furnishes some OJT at end of course to dovetail with 
prospective employers’ non-IBM computers. Director, 
Milton Bauman, ACM and Pr|ce, Waterhouse and 
Co.,Phila. 
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The Q approach to problem solving 


by J. D. McCULLY 

TRW Systems 
Redondo Beach, California 


INTRODUCTION 

The problem of determining derivatives on a digital 
computer has received a great deal of attention for 
several years. Some exotic systems have been developed 
and numerous papers have treated the problem. In 
1964 it was suggested by Wengert 1 that the chain rule 
could be applied to values for the determination of 
derivatives. 

This general concept has served as the basis for a 
series of programs developed at TRW Systems. It 
has been expanded to permit the essentially simul- 
taneous computation of first and second partial deriva- 
tives with respect to several independent variables. 
Second partials are especially valuable in optimization 
problems, and excellent results have been obtained 
with this technique. The first program written at TRW 
some years ago to apply Wengert’s chain rule concept 
was called ROP (for Restricted Optimization Program) 
and has been used to optimize sets of algebraic equa- 
tions. After some experience with this program it was 
decided that a complete system should be devised to 
permit wider application of the technique to problems 
where partial derivatives would be of value. The system 
was initially named CUE, for Computer Utility for 
Engineers, but was recently renamed Q in deference 
to another system named CUE. 

The intent was to make Q essentially a computer 
operating system. On the other hand, it was to be used 
within an already existing operating system (SCOPE 
2.1) on TRW’s CDC 6500 machine without modifica- 
tion to the existing system. A good discussion of this 
type of system is found in Glass. 2 The consequence 
was necessarily some added overhead operating cost, 
but it was hoped that two factors would offset this 


added cost. One of these factors was the planned ma 
chine-independent characteristic of the Q system 
which essentially uses only FORTRAN and FORTRAN 
routines (including I-O). In practice, some of the ma- 
chine-oriented functions of the SCOPE operating 
system proved impossible to resist and conversion to 
another machine may be less easy than was originally 
planned. 

The second factor that would make Q attractive 
despite the increased machine time was the inclusion 
of several unique features in the system. The most im- 
portant of these features is the above mentioned par- 
tial derivatives. Another is dynamic storage, and a 
third feature of interest is a macro processor for the 
input language. With this feature the system is suitable 
for use by the engineer who is more or less familiar with 
FORTRAN and wants his job done quickly even at 
the expense of some extra machine time. 

S ample problems 

Before the structure and characteristics of the Q 
system are described in detail, it may be useful to 
give some examples of the kind of problem for which 
it has proved most useful. These examples are taken 
from INTRODUCTION to SLANG. 3 In general it 
can be said that Q is suitable for mathematically com- 
plex problems. It has been designed to relieve the user 
of most of the complex calculations involved and to 
provide him with a short turnaround time that makes 
practical a series of alternate approaches or formula- 
tions. 

As an essential part of making Q user-oriented, a 
high-level language called SLANG has been evolved 
to allow easy communication with the computer by 
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engineers with little programming knowledge. For 
purposes of the sample problems it is necessary to keep 
in mind that the problem statements shown are writ- 
ten in SLANG. The convenience of formulating prob- 
lems in this way will be apparent. 

The first example illustrates tjbe use of SLANG for 
solving a typical optimization problem with nonlinear 
implicit equations imbedded in the engineering model. 
The problem is to minimize the weight of a three-stage 
liquid rocket vehicle boosting a payload from the sur- 
face of Mars. The optimum values of thrust level and 
burn time for each stage are to be determined for the 
specified mission. Total burn time, total velocity in- 
crement, and payload weight are given. The SLANG 
statements required to solve this problem are shown 
in Figure 1. 

In this problem, the quantity being minimized is 
WT0T the statement 

OPTIMIZE WT0T (1) 

identifies the payoff function arid establishes an op- 
timization loop which ends with the second END 
LO0P card. The statement 

INDEPENDENT THRUST(2), THRUST (3), 

TBURN(l), TBURN(2) (2) 

designates thrust levels of two stages and burn times 
of two stages as independent variables which are being 
determined by the optimization. Equations Gl and G2 
are being solved to constrain the solution such that 
total velocity increment and burri time match specified 
values. The statement 

SOLVE G1,G2 (3) 

identifies the implicit simultaneous equations being 
solved and establishes an equation solving loop which 
ends with the first END L00 P card. The independent 
variables of the S0LVE loop Are identified by the 
statement. 

INDEPENDENT THRUST(l), TBURN (3) (4) 

Even though they are expressed |in terms of interme- 
diate variables, the equations Gli and G2 are equiva- 
lent to the ultimate form 

i 

Gl = Gl (THRUST (1), TBURN (3)) (5) 

G2 = G2 (THRUST (1), TBURN (3)) 


VARIABLE ISP(3) , ISPVAC(3) ,TBURN(3) ,THRUST(3) ,XI1>(3) , 

* WPR0P(3) ,WSTAGE(3) ,STRFAC(3) ,DELV(3) ,MR(3) 

1 READ DATA 

OPTIMIZE WT0T 

INDEPENDENT THRUST (2) ,THRUST(3) , TBURN (1) , TBURN (2) 
0LIMITS (FPRIN - 0) 

S0LVE Gl,G2 

INDEPENDENT THRUST (1) , TBURN (3) 

DLVT0T - 0 
W - WPAYLD 
TBT0T - 0 
D0 F0R L - 1 T0 3 
I - 4-L 

ISP(I) - ISPVAC(I) * (1 - XIP(I)) 

WPR0P(I) - THRUST (I) * TBURN (I) / ISP (I) 
WSTAGE(I) - 0.0234 * THRUST(I) + WPR0P(I) 

* + 1.255 * WPR0P (I) ** 0.704 + 4 
STRFAC(I) - WPR0P(I) / WSTAGE(I) 

W » W + WSTAGE(I) 

MR(I) - W / (W - WPR0P(I)) 

DELV(I) - GC * ISP(I) * L0GN (MR(I) ) 

DLVT0T - DLVT0T + DELV(I) 

TBT0T - TBT0T + TBURN (I) 

REPEAT 

Gl - DLVT0T - DELVIP 
G2 - TBT0T - TBTIP 
END L00P 
WT0T - W 
PRINT VARIABLES 
END L00P 
G0 T0 1 
END 
DATA 

THRUST-5400, 1237, 317, TBURN-142, 127, 131, GC-32, 174, WPAYLD- 
DELVIP-2 . 8E4 ,TBTIP-400 , ISPVAC-315 , 315 , 315 ,XIP-0 , 0 , 5E-3, 
$END 

Figure 1 — SLANG formulation of sample optimization 
problem 


The purpose of the S0LVE loop is to find the values 
of THRUST (1) and TBURN (3) that satisfy Gl = 
0 and G2 = 0. Engine performance and vehicle weight 
quantities are computed in a loop beginning with the 
statement. 


D0 F0R L - 1 T0 3 (6) 


and ending with 


REPEAT (7) 


The equations between these two statements are used 
three times, one time for each of the three stag;es. Two 
characteristics of SLANG should be evident from this 
example. One is that the SLANG expressions used to 
describe the engineering model olosely resemble those 
of F0RTRAN. The other is that numerical algorithms 
for optimization and nonlinear equation solving are 
invoked using the commands OPTIMIZE and S0LVE. 

The total running time for this problem was eight 
seconds on the CDC 6500. The printout of the solution 
is shown in Figure 2. 

The second example demonstrates how a S0LVE 
loop can be used to match an integration boundary 
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Variable Values 


DELVIP 2.80000E+04 DELV 
GC 3.21740E+01 G1 
3. 15000E+02 ISP 
L 3.00000E+00 MR 

7 . 95069E-01 
1. 33079E+02 

WPAYLD 5. OOOQOE+Ol WPROP 
6. 78801E+02 
0 . 


1.06014E+04 9.30107E+03 

0< G2 0. ISPVAC 

3. 15000E+02 3. 15000E+02 

2.84635E+00 2.50361E+00 

7 . 14S42E-01 TBTIP 4.00000E+02 TBTOT 
1. 15853E+02 THRUST S.11094E+03 
2.45111E+03 5. 39694E+02 

1. 69824E+02 WTOT 3.77865E+03 W 
5.00000E-03 


8.09?54E+03 DLVTOT 2.80000E+04 
3.15000E+02 3.15000E+02 
3, 13425E+02 I 1.00000E+00 
2. 23222E+00 STRFAC 8.51072E-01 
4.00000E+02 TBURN 1.51068E+02 
1.27746E+03 3.28288E+02 
1.21347E+02 WSTAGE 2.88002E+03 
3.77865E+03 XIP 0. 


pressure, PCEQ, achieved at time TSPEC. PCEQ 
is computed from input data. The lower limit of inte- 
gration, PCIG, is the ignition pressure, and is an input 
constant. 


Figure 2 — SLANG printout of results from problem 
shown in Figure 1 

/S0LID R0CKET ENGINE START-UP TRANSIENT PR0BLEM 
/ THE PURP0SE 0F THIS PR0BLEM IS T0 DETERMINE 
/ THE PERCENTAGE 0F EQUILIBRIUM CHAMBER PRESSURE 
/ ATTAINED BY AN END BURNING S0LID R0CKET ENGINE 
/ AT A SPECIFIED TIME (TSPEC) DURING ITS STARTUP 
/ TRANSIENT 

/ THE PR0BLEM INV0LVES INTEGRATI0N, B0UNDARY C0NDITI0N 
/ MATCHING, AND HAS A S0LVE L00P 
READ DATA 

PCEQ * (12/32.174 * RH0P * CSTARO * A * K) ** (1/(1 - N - Q)> 
S0LVE C0NST 
INDEPENDENT PCSPEC 

FAC = VC / (GAM ** 2 * AT * 12) 

LET TINTEG - INTEGRAL (1 / (CSTARO * PC ** 

* Q * PC * (RH0P * CSTARO * PC ** Q * A * 

* K * PC ** (N -1) * 12 / 32.174 - 1)), 

* PC - PCIG T0 PCSPEC IN 10 STEPS) 

TC0MP - FAC * TINTEG 

C0NST * TC0MP - TSPEC 
PRINT VARIABLES 

END L00P 

PERCNT - PCSPEC * 100 / PCEQ 
PRINT VARIABLES PERCNT 
ST0P 
END 
DATA 

TSPEC - 0.5, 

PCSPEC - 1500, 

PCIG - 700, 

RH0P - 0.064 
CSTARO - 3320, 

A - 4.4 E-4 , 

K - 172.65, 

N - 0.745, 

Q - 0.015, 

VC » 220, 

GAM - 0.66175, 

AT - 0.35, 

$END 


Strucutre of the Q system 

The Q system is basically a Complier/Interpreter 
type package with the four major elements of the sys- 
tem shown in Figure 4. The user’s input language 
(SLANG) is converted by a set of system-supplied 
macros into the MODTRAN language. The MOD- 
TRAN compiler then converts this language into an 
assortment of pseudo instructions and some associated 
tables. These are processed by the link editor before 
going to the interpreter for execution. 

With this system it is possible to omit the macro 
processor if the user chooses to write directly in MOD- 
TRAN. On the other hand, a user might wish to use 
only the macro processor to perform some transforma- 
tions on BCD data. 

The ML/I processor was originally designed by P. 
J. Brown 4 of Cambridge University, who supplied the 
logic to TRW. The processor was converted to FOR- 
TRAN with little difficulty, and this version was in- 
cluded in the CUE system for making an initial pass 
at the input of non-programmer users. It was found 
that the average engineer in a hurry (for whom the 
system was designed) was unwilling to take the trouble 
of writing his own macros. Ideas for suitable macros 
were solicited from potential engineer users, and the 
resulting language was christened SLANG. Additions 
are continuously being made to SLANG to make it 
more useful. At one time it was planned to have four 


Figure 3 — SLANG formulation of boundary matching 
problem 


condition. The complete set of input is shown in Figure 
3. 

The expression in the argument of the integration 
statement is an equation for dt/dP c (where t = time, 
P c = chamber pressure) during the start up transient 
of a solid rocket engine. The problem is to determine 
the value of chamber pressure at a specified time. This 
value is the upper limit of integration, and is being 
computed such that the integrated time (TC0MP) 
matches the specified time (TSPEC). That is, when 
the value of the constraint, C0NST, is zero, the upper 
integration limit PCSPEC is the value of chamber 
pressure at TSPEC. The final calculation of PERCNT 
computes the percentage of equilibrium chamber 



Figure 4 — Basic Q system elements 
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“dialects” of SLANG of increasing degrees of sophis- 
tication, but this idea was abandoned in favor of a 
single version. 

An example of how the processor converts SLANG 
macros to MODTRAN is shown in Figure 5. It is 
worth noting that the writing and debugging of macro 
definitions is considerably easier than would be the 
modification of the MODTRAN compiler itself. The 
programmer need in general bei concerned only with 
the particular macro definition he is working on, and 
both his inputs and his outputs arp in BCD. 

It was originally planned to; incorporate some of 
the more popular SLANG variations into MODTRAN, 
thus reducing processing time; unfortunately this 
project has been continuously postponed because of 
more pressing work. The more tecent versions of the 
Q system allow for relocatable subroutines, which have 
served to reduce machine time considerably. Previously 
an illusion of subroutines was! created by suitable 
macros, but it was necessary to process the user’s en- 
tire input deck each time the equations were modified. 

The MODTRAN language bears a strong resem- 
blance to FORTRAN or BASICj since it was designed 
by FORTRAN programmers. Algebraic statements are 
essentially the same, and DO loops are provided that 
have the same function except that they provide for 
backward stepping when desired. Arrays are as in 
FORTRAN except that they are limited to two indexes. 
READ and WRITE statements are similar, as are 
FORMAT statements. All variables are floating point 
as in BASIC, and corrections are automatically made 
for round-off errors on comparisons. 



SLANG MODTRAN 


Figure 5 — Example of SLANG /MODTRAN conversion 


Some MODTRAN statements are unusual, as for 
example EXECUTE label, which will cause a transfer 
to the label. When a JUMPBACK statement is en- 
countered, control is tranferred to the statement fol- 
lowing the EXECUTE label. 

The FORTRAN subroutine concept is used in 
MODTRAN, but the COMMON method of communi- 
cating between subroutines was eliminated in favor 
of using the names of the variables themselves to 
communicate locations, as in BASIC and other lan- 
guages. Another provision is that a variable can be 
typed as LOCAL to a particular subroutine, permitting 
subroutines to be written independently. The FOR- 
TRAN concept of calling sequence/argument list is 
used for communication between such subroutines, 1 
so that MODTRAN subroutines may be written and, 
placed in the system library for general use. 

The MODTRAN compiler has no provision for 
user- written functions (arithemetic or other), which 
makes it possible to determine an indexed variable 
even though no suitable allocation statement has ap- 
peared. When the compiler encounters what appears to 
be an array (which could be a misspelled system func- 
tion), it processes the indices and assumes that by the 
time the statement is executed another statement 
making the allocation for the array will have been 
previously executed. The allocation statement can be 
either GLOBAL or LOCAL. For example, the state- 
ment: 

GL0BAL X (NR0W, NC3L), Y (10), Z (8) 

will cause the release of any arrays previously asso- 
ciated with X and Y and the allocation of ten words 
to Y as well as the generation of an array NR0W rows 
by NC0L columns for X. Such statements are executa- 
ble, and once executed will apply to all other subrou- 
tines where the variables X and Y appear as globals. 
The variable Z in this statement is only given a global 
assignment by the compiler and that portion of the 
statement is not executable. If the compiler encoun- 
ters a variable not defined as GL0BAL or L0CAL it 
assigns the variable to the nominal category previously 
defined by the user (normally GL0BAL). 

Generation of partial derivatives 

Perhaps the most interesting feature of the Q system 
is the way in which partial derivatives are treated. 
The MODTRAN language provides for specification 
of three levels of partials: 
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NO PARTIALS 

FIRST PARTIALS List (9) 

SECOND PARTIALS List 

In these statements, List specifies which variables 
are to be the independent variables. An INDEPEN- 
DENT List statement might also be used for this 
purpose. A typical set of statements might be : 

SECOND PARTIALS X, Y, Z 
F = Y * X/Z (10) 

D - F * F 

These statements will cause the dependent variables 
D and F to be evaluated and all of the first and second 
partial derivatives of these two variables with respect 
to X, Y, and Z will be computed. The resulting storage 
requirements can become quite large; in the case of 
three independent variables one word is required for 
the value, three for first partials, and six for second 
partials, making a total of ten words (see equation 
11). In the case of 15 independent variables 136 words 
of storage are required for each dependent variable. 
The system tries to hold down the total storage re- 
quired by returning the partial storage to the free 
area wherever possible. We are considering a scheme 
to reduce the number of words required in the case of 
a dependent variable that is not a function of all the 
independent variables. 

The actual operation of computing partial deriva- 
tives is carried out by the interpreter in the course of 
evaluating the given expressions of the problem. This 
evaluation consists essentially of a sequence of opera- 
tions, which may be unary (performed on a single 
variable), for example SIN (X) or binary (performed 
on two variables), for example X*Y. The result of an 
operation either becomes one of the variables going 
into the next operation or, if the sequence is complete, 
the result is stored as the answer in the appropriate 
location. An operation is performed by the interpreter 
causing a transfer to one of the appropriate subrou- 
tines. Each subroutine has either one primary input 
(unary), or two primary inputs (binary), and a single 
output. The inputs (operands) may or may not have 
partials, and if they do it may be necessary to compute 
only first partials or both first and second partials. 
Consider the division operator, for example; either or 
both the divisor and dividend may or may not have 
partials, leading to four different possible cases. Each 
case is different with respect to how the partials of 


the resultant variables are computed, and four separate 
subroutines have been written for the division operator; 
the appropriate subroutine is selected by the interpreter 
during the execution of the user’s program. If an equa- 
tion is evaluated several times, it is entirely possible 
that a variable may have partials during one evalua- 
tion and none during another, in which case the appro- 
priate subroutine would be executed during each 
evaluation. At the time that the link edit is performed 
every variable is given a core location assignment. 
If the variable has no partials then the value associ- 
ated with the variable is stored in this location. If, 
however, during the execution of the model the variable 
develops partial derivatives by being a function of 
variables which have partials, then a vector is opened 
for the variable and the initial location replaced by a 
pointer to this vector. As an illustration, consider the 
following sample vector for a variable F when there 
are three independent variables X, Y, and Z : 

dF dF dF d 2 F d 2 F d 2 F 

(ID 

d 2 F d 2 F d 2 F 

dYdY’WdZ’ dZdZ 

All of the variables which have partials will have similar 
associated vectors. The independent variables will 
each have such a vector where all of the partials are 
zero except for the one corresponding to the derivative 
of the independent with respect to itself where a value 
of one will be stored. When an INDEPENDENT 
statement is encountered all of the vectors which 
happen to be active at that point are deleted and a 
new set of independent vectors set up. As the run 
progresses new dependent vectors will be allocated. 

In MODTRAN statements for unary operations, the 
subroutines tend to be similar except for the three 
lines for the evaluation of F, SI, and S2 (see below for 
definition of SI and S2). In the example of Figure 6, 
SINX is used as the name of the interpreter subrou- 
tine for evaluating the sine of a variable. NTJMIND 
indicates the number of independent variables, E is 
the operand vector, and F is the resultant vector. 

There would of course be similar routines for COS, 
EXP, TAN, etc., which might appear in the user’s 
input. In the general case all of these subroutines would 
be identical except for F, SI and S2. Suppose oper 
corresponds to the unary operator that is being used, 
then F, SI and S2 can be expressed in general as fol- 
lows: 
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SUBROUTINE SINX(E,F) 
DIMENSION 

C0MM0N/NUMIND / NUMIND 

1 F(l) = SIN(E(1)) 

2 Sl= COS(E(l)) 

3 S2= -SIN(E(1) ) 
M=NUMIND 

DO 20 K=l, NUMIND 
IF (FIRST) G0 TO 20 
S3=S2*E(K) 

DO 10 L=1,K 
M»M+1 

10 F(M)=E(M)*S1+S3*E(L) 

20 F(K)=F(K)*S1 

RETURN 
END 


SUBROUTINE MUL(D,E,F) 

DIMENSION D(1),E(1),F(1) 

C0MM0N / NUMIND / NUMIND 

M=NUMIND 

DO 20 K-l, NUMIND 

DO 10 L«1,K 

IF(FIRST) G0 TO 20 

M-M+l 

10 F(M)-D(M)*E(1)+E(M)*D(1)+D(K)*E(L)+D(L)*E(K) 

20 F(K)-D(1)*E(K)+D(K)*E(1) 

F(1)“D(1)*E(1) 

RETURN 

END 


Figure 7 — Sample interpreter subroutine for binary 
operation 


Then from any table of derivatives 


dF 

dX 


_ dE dD 
~ D 'dx + &x • E 


(14) 


Figure 6 — Sample interpreter subroutine for unary 
operation 


F = oper (E) 

si . -foffg.) (12) 

dE 

ao _ d 2 oper(E) 

82 dE 2 

Should it be necessary to evaluate only first partials 
then at the time each of the subroutines is executed 
the logical variable FIRST will be set to true and the 
computing of the second partials will be bypassed. 

Binary functions vary considerably, but an example 
of this type of function is given in Figure 7 for the 
multiplication operation. D and E are the operands 
and F is the resultant vector. 

Perhaps it would be useful to demonstrate the man- 
ner in which the equations of the MUL routine were 
derived. Assuming for purposes of explanation that 
X & Y are the only independent variables then we 
know that 

(13) 


while 

d F 2 _ ^ d 2 E dD. dE dD. dE 
dXdY “ D ' dXdY + dY dX + dX dY 

(15) 


The reader should be able to convince himself that 
the statement at label 20 on Figure 7 corresponds 
to (14) while the statement at label 10 corresponds 
to (15). It should also be possible to place these state- 
ments in the context of a generalized number of in- 
dependent variables by referencing equation No. 11. 

Tabular function defined by arrays of input data 
are handled by a system routine which fits a poly- 
nomial to the data and then assumes that the deriva- 
tives of the polynomial correspond to those of the 
function. This is of course rather cumbersome and 
the results may not be accurate for many functions. 

System supplied routines 

In addition to the usual system-supplied routines 
such as those illustrated above, the Q system attempts 
to provide rather elaborate sets of routines which are 


F - DE 
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called algorithms. These routines should remove some 
of the burden off the user to provide a method of so- 
lution. They are kept in the Q FORTRAN library 
and are called as needed. Since one of the main fea- 
tures of the system is the ability to take partial deriva- 
tives, it is not surprising that most of these routines 
are built around this capability. The most important 
and most frequently used of these algorithms are 
called SOLVE, OPTIM, and INTEG. 

The SOLVE algorithm makes use of the Newton- 
Raphson technique in order to drive specified func- 
tions to zero. In order to do this it is necessary to 
evaluate the first partial derivative of the functions 
and apply correction factors to the independent vari- 
ables based on this information until the convergence 
criteria is reached. Since it is possible to obtain first 
partial derivatives by numerical techniques this 
method of solving functions is rather common. The 
partials of the Q system should be more accurate, how- 
ever, especially in the neighborhood of singularities. 

The optimization function is initiated by writing 
MAXIMIZE, MINIMIZE, or CRITICALIZE fol- 
lowed by the variable to be optimized and an INDE- 
PENDENT statement for the variables the system 
will vary in an attempt to find a solution. The partial 
derivatives play a major role in this algorithm. Original- 
ly the system made use of Lagrangian multipliers in 
conjunction with the New T ton-Raphson technique for 
optimization, but this method has been superseded 
by a modified version of rotational discrimination, as 
described by Law and Fariss. 5 

The INTEG algorithm is used to integrate a set of 
simultaneous differential equations by a fourth-order 
Runge-Kutta method. It can be combined with the 
SOLVE algorithm to solve two-point boundary value 
problems, as in the second SLANG example given 
earlier. In this case the INTEG routine is imbedded 
within a SOLVE loop, where the solution to the 
SOLVE operation is the end points to match certain 
expressions. Other routines are available to save and 
restore partial derivatives, to add and delete indepen- 
dent variables, to input or printout all global variables, 
etc. 

Implementation of the system 

As implemented on the CDC SCOPE system Q 
requires two back-to-back executions under SCOPE 
with a compilation by the SCOPE FORTRAN com- 
piler separating the executions. The user need not be 
aware of these efforts in his behalf, however, as he sub- 
mits one job and gets one output. It is even possible 


to place the SCOPE control cards necessary to run 
the Q system onto a file, along with the various other 
files required by the system, so that the user need only 
see a few of the SCOPE control cards. 

In the first execution under the Q system a basic 
monitor surveys the user control cards to determine 
the objective of the decks which the user supplies. 
Thus in one run the user might have some SLANG 
decks to be sent via the ML/I processor to the MOD- 
TRAN compiler, some MODTRAN decks which 
would go directly to that compiler, some FORTRAN 
decks for compilation by the SCOPE FORTRAN 
compiler when it is called in between the executions, 
and perhaps even some FORTRAN and/or MOD- 
TRAN relocatable decks. Control cards are intermixed 
with other input and some action is normally taken 
immediately with the cards following a control card 
until the next control card in encountered. Sometimes 
a set of cards is sent directly to a processor, such as a 
MODTRAN deck going to the MODTRAN compiler, 
while in other cases it is necessary to place the deck 
on a file for later processing, such as a FORTRAN 
deck. The flow of operations is shown in Figure 8. 

Once all the user’s input except for data cards has 
been read and either processed or assigned, the link 
editor is called in to tie together the various MOD- 
TRAN routines. The link editor assigns all of the 
variables to their final locations in the data portion 
of the bucket and performs the required relocation of 
the pseudo instructions. An attempt is made to satisfy 
all of the externals referenced from MODTRAN 
routines with MODTRAN entry points, including 
a search of the Q MODTRAN library file. The references 
which are still unsatisfied are assumed to be for FOR- 
TRAN routines and a search of the Q FORTRAN 
library file is performed. Any routines found there are 
pulled off for loading by the next execution. At this 
point the link editor writes a FORTRAN routine 
which will be compiled by the CDC FORTRAN 
compiler. This routine consists of a computed GO TO 
followed by a call to each of the routines which it has 
determined are FORTRAN. For example it might 
write: 

SUBROUTINE CALLI 
COMMON/N/N 
GOTO (1,2), N 

1 CALL INTEG (15) 

RETURN 

2 CALLBROP 
RETURN 
END 
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Figure 8 — Flow diagram of the Q. system 


Actually the routine CALLI will be more complicated 
than this example, since the user is allowed to have 
arguments to these FORTRAN routines. The basic 
concept is, however, that this is the manner in which 
it is made posible to call a FORTRAN routine from a 
MODTRAN routine. Should the user, for example 
write 

CALL INTEG (16) 

in MODTRAN he will in actual fact be calling sub- 
routine CALLI with N set equal to 1. Since the 
routine CALLI is placed in the input stream to the 
FORTRAN compiler the user receives a listing of 
this routine in the middle of his output. It was not 
deemed worthwhile to try to suppress this listing, 
since the user might very well be compiling some of 
his own routines on the same call to the FORTRAN 
compiler. 

After the link editor has relinquished control to 
the FORTRAN compiler and that processor has com- 
pleted its task, the second execution of the user’s job 
begins. This consists of a loading of the Q interpreter 
and all of the FORTRAN routines which have been 
collected by the link editor on the previous execution 


and placed on the FORTRAN relocatable file. The 
nature of this core load varies radically depending on 
what the user requires. Control is initially passed to 
the main MODTRAN routine but after that the user 
is on his own. 

During execution of a MODTRAN routine, the 
pseudo instructions put out by the MODTRAN com- 
piler are being interpreted. As is usual with interpretive 
schemes, quite a bit of control can be exercised in 
making sure that the user is not getting into trouble 
and in taking some appropriate action when he at- 
tempts to do something which would be improper. 

There are three user data areas in the Q system: 
variables, arrays, and partials. The three areas are 
rather heavily intertwined with pointers, a pointer 
being distinguishable from a value by the fact that 
it is a positive integer while a value is a normalized 
floating point number. Initially only the variable 
area is assigned (by the link editor) and the interpre- 
tation of the user’s program causes the buildup of the 
other two areas. Thus suppose the user says 

GLOBAL XflO) (17) 

where X was previously only a value. An array will be 
opened in the array area and the location at which X 
was assigned will be replaced by a pointer to the array. 
A double tag system as described by Knuth 6 is used 
for the allocation of arrays, a system which allows a 
good method of returning variable length arrays to 
the free area. Two more words are used to specify the 
dimensions on the array, causing the use of four words 
in addition to the actual size of the user’s array. When 
an array is released, the two words which were used 
for indexing are replaced by linking pointers to facili- 
tate the search for free areas of adequate size. The user 
of course need not be aware of this process when he 
opens or closes an array. 

It is also frequently the case that a variable will 
not only have a value associated with it but will have 
some partial derivatives. In this case the location of 
the variable, or the indexed location within its array, 
is replaced by a pointer into the partial area. At the 
location in the partial area the value and the asso- 
ciated partials are stored. Some rather complicated 
chaining-down pointers may result before the desired 
location is finally achieved; but normally if the user 
is taking partials he will be spending most of his time 
during execution doing just that, computing partials, 
and the time spent on pointers will be relatively small. 
It was also necessary to make some provision for re- 
turning these partial vectors to the free area, but this 
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is a rather simple matter since all of these vectors 
are of the same length. 

Additional complications are entered into the system 
when the user performs such operations as saving 
partials and beginning a new set. This is basically 
performed by closing off the current partial area and 
opening up a new one. A swapping of pointers with 
values occurs so that the partials can be restored later. 

SUMMARY 

No claims are made that the Q system is a direct chal- 
lenge to other computer systems. It does, however, 
offer anapproach to some rather difficult problems. 
As was pointed out earlier, it is easy to introduce 
modifications into the SLANG language, a charac- 
teristic which is not common to programming lan- 
guages. It is also rather easy to introduce new al- 
gorithms into the system, thereby expanding its 
problem solving capability. It is hoped that the Q 
system constitutes a basis for further development 
along these lines since the user is frequently denied 
this flexibility in a computer system. 
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Self-contained exponentiation* 


by NANCY W. CLARK and W. J. CODY 

Argonne National Laboratory 
Argonne, Illinois 


INTRODUCTION 

The traditional implementation for floating-point 
exponentiation, x raised to the y power, is to compute 
exp (y in(x)) using standard subroutines for the 
logarithm and the exponential function. While it is 
possible to provide extremely accurate subroutines 
for these latter functions, we shall shortly see that 
this is seldom done. Even in those rare cases where 
excellent subroutines are available, the exponentiation 
routine, for sound theoretical reasons, is poor. In this 
paper, we present brief statistics indicative of the 
quality of these three subroutines in the basic Fortran 
libraries provided by various manufacturers, a de- 
tailed error analysis for exponentiation, and a method 
for exponentiation via self-contained subroutines. 

In the following discussion we will use the term 
exponentiation to refer to x y where we will always as- 
sume x > 0. The term exponential will refer to c y 
where c is a fixed constant base, usually either 2 or e. 

The present situation 

With the cooperation of a number of different in- 
dividuals and computing centers, we ran some simple 
tests on the exponential, logarithm and exponentia- 
tion subroutines in the basic Fortran libraries on eight 
different computers representing six different manu- 
facturers. The only version of the single-precision 
library on the CDC-3600 available to us contained 
routines we had written according to the methods to 
be described and does not necessarily represent the 


* Work performed under the auspices of the U. S. Atomic Energy 

Commission. 


manufacturer’s library. We also tested our own version 
of the library for the IBM S/360 in addition to the 
standard library. 

These tests were not intended to be complete certifi- 
cations of the routines tested, but were designed to 
lightly probe areas where such subroutines are most 
likely to have trouble. The tests consisted of compu- 
tations with a series of arguments exactly representable 
in binary notation. The corresponding function values 
were output in octal or hexadecimal form and compared 
against similar computations in 96-bit arithmetic on 
a CDC 6400. The computations involved were: 


exp(n) ix = 40(1)88, 

fri(x) x = .25(.015625)2.0, 


x**y (x, y) 


(2 W , 22 — n) 

(4», 11 - n/2) 

(2”, 44 - 4n) 

(.75 X 2", 46 - 4n) 


n = 0(1)22, 


n = 1(1)11. 


The test results are summarized in Table I. 

Certain of the computers used have either octal or 
hexadecimal floating-point arithmetic. On these com- 
puters, a mantissa can be properly normalized and 
still have the first two or three bits zero. This accounts 
for the apparent tabular discrepancies between the 
sum of the maximum number of bits in error and the 
minimum number of correct bits, and the total number 
of bit ? in the mantissa on these machines. 
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TABLE I — Accuracy Test Results 

Machine and 

Single-Precision 

Double-Precision 

Machine and 

Single-Precision 

Double-Precision 

Subroutine 




Subroutine 




M 

N 

M N 


M N 

M N 

Burroughs B-5000 

(39 bit mantissa) 

(78 bit mantissa) 

IBM 360/75 

(24 bit mantissa) 

(56 bit mantissa) 





IBM library 



EXP 

9 

30 

8 69 

EXP 

1 21 

7 49 

LN 

3 

35 

7 71 

LN 

3 20 

3 52 

X**y 

7 

31 

11 67 

X**Y 

10 14 

10 46 

Control Data 3600 

(36 bit mantissa, 

(84 bit mantissa, 

IBM 360/75 

(24 bit mantissa) 

(56 bit mantissa) 


Argonne library) 

CDC library) 

Argonne library 



EXP 

1 

35 

4 80 

EXP 

1 21 

1 52 

LN 

2 

34 

5 79 

LN 

2 21 

2 52 

X**Y 

1 

35 

8 76 

X**y 

2 21 

1 52 

Control Data 6400 

(48 bit mantissa) 


SDS Sigma 7 

(24 bit mantissa) 

(56 bit mantissa) 

EXP 

1 

47 


EXP 

4 20 

8 48 

LN 

2 

46 


LN 

4 19 

4 50 

X**Y 

7 

41 


X**y 

8 15 

8 46 

G.E. 225 

(30 bit mantissa, FIZMOP system) 

Univac 1107 

(27 bit mantissa) 

(54 bit mantissa) 

EXP 

3 

27 


EXP 

2 25 

4 50 

LN 

12 

18 


LN 

6 21 

7 47 

X**Y 

10 

20 


X**Y 

6 21 

10 44 

G.E. 645 

(27 bit mantissa) 

(63 bit mantissa) 

Univac 1108 

(27 bit mantissa) 

(60 bit mantissa) 

EXP 

1 

26 

14 49 

EXP 

2 25 

8 52 

LN 

4 

23 

4 59 

LN 

6 21 

6 54 

X**Y 

1 

26 

14 49 

X**Y 

8 19 

9 51 


M = maximum number of bits in error. N = minimum number of correct significant bits. 


We will show presently that accuracy in exponentia- 
tion depends very heavily on the accuracy in the 
calculation of the exponential fuhction. Note, however, 
that even with a good exponential function, as is 
apparently the case in the single precision CDC 6400 
and the original IBM 360 libraries, the exponentiation 
routine can still be in error by two to three significant 
decimal places or more. Also note that the exponentia- 
tion routines corresponding to our methods as well 
as the single-precision routine on the G.E 645 display 
primarily round-off error in these tests. 

Error analysis 

There are two major types of error in any function 
subroutine. The first is transmitted error, i.e., error 
due to small errors in the arguments. If we assume 


z = f(x) 

where f(x) is differentiable, then 

* f'(x) * / 1N 

8z Zz x 8x (1) 

f(x) 

where 

8z — Az/z dz/z (2) 

denotes the relative error in z, and A z denotes the 
absolute error in z. It is clear that the transmitted 
error, 8z, depends solely on the inherited error, 8x, 
and not on the subroutine. The second type of error 
is generated error, i.e., that error generated by the 
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computational process. This includes both errors due 
to truncating an infinite process at some finite point 
and roundoff errors. 

Even infinitely precise subroutines have no control 
over inherited error. Therefore, in designing subroutines 
we assume there is no inherited error and seek to 
minimize the generated error. 

Now let us consider the logarithm-exponential 
method for exponentiation. We use the relation 

x* = c w , X > 0, (3) 

where 

w = ys 

and 

s = logc(x) . 

From (1) and (2), and recalling our assumption that 
<5x = 5y = 0, we see 

Aw = yAs 

where As represents only the generated error from the 
logarithm computation. 

If 

u = c w , 

then 

<5u = In c Aw + SG(w) (4) 

where 8G(w) denotes the generated relative error from 
the exponential computation. For good exponential 
routines dG(w) affects only the least significant one 
or two bits of u. Thus, the relative error in the exponen- 
tiation is essentially proportional to the absolute 
error in w. Clearly, we want to minimize Aw as it 
appears to the exponential routine. 

There are two major contributions to this error: 
the generated error from the logarithm calculation, 
and the finite word length of the computer. The 
second is by far the more important of the two. Sup- 
pose the floating-point mantissa of the calculator 
contains 2t significant bits, but w is of the order of 
2k Then the floating-point representation of w, the argu- 
ment to be passed to a standard exponential routine, 
may have a rounding error as large as i.e., Aw ^ 2~~K 
Consequently, u may be accurate to only about t bits 


independently of the accuracy of the logarithm calcu- 
lation. This is the reason some of our tests found in- 
accurate exponentiation even though the logarithm 
and exponential routines appeared to be reasonably 
accurate. 

A new approach 

There are at least two alternatives to the traditional 
computation. One is to resort to “overkill” by carrying 
out the traditional computation in a higher precision 
arithmetic. This is expensive in time; it is easy to do 
for single-precision routines, but difficult for double 
precision routines. (Is this the approach on the G. E. 
645?) The second alternative is to raise the status 
of exponentiation routines. At the moment they are 
considered to be secondary routines which call upon 
the primary routines for the exponential and logarithm. 
We propose that they become primary, self-contained 
routines with possible secondary entry points for the 
exponential and logarithm. 

If we accept this major reversal in philosophy, we 
free the computation of several restrictions. For 
example, we need not pick c = e in Eqs. (3) and (4), 
but can make the choice c = 2 which appears most 
natural for a computer, and which introduces the 
factor /n2 = .69315 in Eq. (4). This permits us to 
obtain extra significance in the results of the logarithm 
computation, as we shall shortly see, and to retain 
this significance throughout the remainder of the calcu- 
lation. 

The first implementations of the algorithm we will 
outline were programmed using single-precision fixed- 
point arithmetic to do single-precision exponentiation 
on both the CDC 3600 and the IBM 360 computers. 
Because neither computer allows efficient double- 
precision fixed-point arithmetic, the algorithm has to 
be modified to use double-precision floating-point 
arithmetic to do double-precision exponentiation. So 
that the presentation will not be too abstract, we will 
present basically the algorithm as used on the IBM 
360 in double-precision. Modifications for single- 
precision floating point or fixed point versions, or for 
other machines should be obvious. 

We first reduce the range over which the logarithm 
must be approximated. Let 

x = 2 fc *m, 1/2 < m < 1, 

and choose 


b = n/16 
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and 

a = 2~” /16 , 

n 'an odd positive integer less than 16, such that 
x = 2 k ~ b m/ a 

where 

|log 2 (m/a)| < 1/16. ; 

Then 

s = log 2 (x) = Si + s 2 

where 

si = k — b, 

- = log! Cn?H), 

/I + z\ m 

Vl ~ z) a \> 

and 

m — a 

z = : — . 

m + a 

Since z is quite small (|z| < .022), i s 2 is easily computed 
to full floating-point accuracy using a low order rational 
approximation, or even the first few terms of the Taylor 
series, provided z is computed accurately. (A little 
extra care is necessary at this point in base 16 floating- 
point but we will not go into the; details here.) Since x 
is assumed to be exact, m is exact and we can achieve 
full precision in ra-a by breaking the constant a into 
two parts such that 

a = ai + a 2 

to the precision desired and such that the exponent on 
a 2 is much less than that on a x . Then the computation 

m — a = (m — ai) — a 2 

will retain the low order bits of a. Normal floating- 
point can be used for the rest of the evaluation of 
Note that by carrying Si as one floating point num- 
ber, and s 2 as another, we have rather painlessly 


achieved a logarithm accurate to well beyond usual 
working precision. Since |s 2 | < 1/16, the absolute error 
in $ is now about 2~ 4 times the normal relative error in 
floating point. Careful multiplication of s by y will 
minimize the crucial quantity Aw, At this point, the 
usefulness of fixed-point arithmetic with the extra 
significant bits in the representation of a number is 
apparent. When such arithmetic is not available, as 
we have assumed is the case, it is necessary to arrange 
the floating-point computations to achieve the extra 
significance at minimal cost. This is done as follows. 

Let us say we reduce a number z when we write it 
in the form 

Z = Zj + z 2 

such that z\ is the integer part of 16z, Essentially, 
then, s is already in reduced form. We compute the 
exponent w in reduced form by writing 

y = yi + y 2 , 

where y\ and y 2 are the double-precision representa- 
tions of the most significant and least significant halves 
of y respectively, and forming the products s x y x , s 2 y Xi 
and sy 2 , Each of these quantities is again reduced and 
the results combined to form the reduced 

W = Wi -J- w 2 . 

Now wi is of the form 

wi = / + j/16 

where l and j are integers. We then finally compute the 
exponential value 

u = 21 X 2 ?/l6 X 2 wi . (5) 

Since \w 2 \ < 1/16, a Taylor series computation of the 
exponential is quite efficient, although we used rational 
Chebyshev approximations. The quantities 2?/ 16 can be 
carried in a table. In fact, if Eq. (5) is rewritten as 

u = 2 l+ l X 2^ _16)/16 X 

and the quantities 2~” /16 are tabulated, the same table 
can be used for the constant a needed in the logarithm 
computation. This dictates the form of the earlier 
decomposition of a into a\ and a 2 . Clearly a\ should be 
the value of a correctly rounded to working precision 
while a 2 becomes a positive or negative correction term. 
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TABLE II — Random argument tests on conventional double-precision X**Y on IBM 360/75 


Argument Range Frequency of Bit Errors Max. Rel. RMS 


X 

y 

0 

1 

2 

3 

No. of bits in error 

4 5 6 7 

8 

9 

10 

other 

Error 

Rel. Error 

(1/16,16) (-4,4) 

272 

467 

405 

371 

240 

197 

47 

1 

0 

0 

0 

0 

1.25E-15 

3.65E-16 

(2 -16 ,2 16 ) 1 

( — 16,16) 

78 

123 

153 

168 

247 

377 

321 

294 

195 

44 

0 

0 

8.82E-15 

2.70E-15 

(2-82 o*2) (-8,8) 

80 

109 

131 

152 

216 

288 

295 

234 

241 

120 

86 

48 

5.08E-14 

9.60E-15 

(2-64 264) ( 

:-4,4) 

57 

95 

115 

126 

161 

215 

293 

352 

303 

192 

82 

9 

2.68E-14 

6.97E-15 

(2-8,28) (-32,32) 

59 

90 

115 

109 

199 

312 

406 

343 

253 

107 

7 

0 

1.40E-14 

4.02E-15 

(1/16,16) ( 

( — 64,64) 

60 

96 

110 

128 

196 

275 

318 

318 

281 

167 

48 

3 

1.95E-14 

5.73E-15 


Average execution time for (x, y) random in (0, 1) = 195 ^se cs. 


TABLE III — Random argument tests on self-contained double-precision X**Y on IBM 360/75 


Argument Range Frequency of Bit Errors Max. Rel. RMS 


X 

y 

0 

No. of bits in error 

1 2 3 

4 

Error 

Errror 

Rel. 

(1/16,16) (-4,4) 

1301 

677 

22 

0 

0 

2.22E-16 

6.24E-17 

(2-i6,2 16 ) 

(-16,16) 

1206 

759 

35 

0 

0 

2.22E-16 

6.11E-17 

(2-32,282) 

(-8,8) 

1314 

667 

19 

0 

0 

2.22E-16 

5.81E-17 

(2-64, 2 64 ) 

(-4,4) 

1350 

634 

16 

0 

0 

2.21E-16 

5.44E-17 

(2-8, 2 s ) 

(-32,32) 

1097 

812 

89 

2 

0 

2.22E-16 

6.31E-17 

(1/16,16) (-64,64) 

872 

823 

250 

52 

3 

2.22E-16 

6.94E-17 


Average execution time for (x, y) random in (0, 1) = 180 jusecs. 


Since the last two factors in u are each less than unity 
in magnitude, and the 2^ +1 factor affects only the 
floating-point exponent, we see that the construction 
of u from its factors is a stable process. Note that the 
error A w y hence by Eq. (4) the error 8u , now depends, 
primarily on the magnitude of y . Using Eq. (4), and 
noting that we have gained an extra four bits in our 
calculation of s, we see that y must be greater than 
roughly 32 before the inaccuracies in w become large 
enough to greatly affect 8u. To verify this point, and 
to provide an in-depth comparison of our method and 
of the traditional computation, we have subjected our 
routine for the IBM 360 and the original IBM routine 
to a full certification as described in references one 
and two. The results, for identical tests, are presented 
in Tables II and III. 

One final word about the fixed point version o l ‘ 
this algorithm. In fixed point, the extra bits over t 1 


normal floating point manitssa length are already 
available. As we have indicated, the decomposition of 
a and y and the reduction of $, w , etc, are no longer 
necessary. This constitutes a savings in storage as 
well as in the number of instructions to be executed. 

But no matter which approach is taken, the fixed 
point or the floating point, the self-contained routine 
can be expected to be competitive timewise with the 
traditional routine because we have saved the over- 
head of linking with other subroutines. All three of 
the self-contained programs we have written 
are actually faster than their traditional counterparts. 
The price is paid in terms of storage. This price can 
be minimized by incorporating entries for the ex- 
ponential and logarithm routines into the exponen- 
tiation routine, thus eliminating separate routines for 
the former. 
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DCDS digital simulating system* 


by H. POTASH, A. TYRRILL, D. ALLEN, 

S. JOSEPH, and G. ESTRIN 

University of California 
Los Angeles, California 


INTRODUCTION— SIMULATION SYSTEMS 

To see a world in a grain of sand 
And a heaven in a wild flower , 

Hold infinity in the palm of your hand 
And eternity in an hour. 

— William Blake 

This article is concerned with the problems of digital 
simulation and describes methods used in the Digital 
Control Design System (DCDS) 1 for the simulation 
of digital structures. The paper is divided into five 
parts: 

• A short introduction to DCDS, its structure and 
purposes. 

• A discussion of simulation techniques, entities and 
attributes. 

. The DCDS pseudo machine simulator. 

. The pseudo machine program. 

. A simple example of a DCDL program. 

DCDS , its structure and purposes 

The Digital Control Design System (DCDS) was 
developed at the University of California at Los 
Angeles to aid in the design and architecture of com- 
puter systems. The design system operates under the 
following assumptions : 


* This research was supported in part by the Atomic Energy 
Commission AT(ll-l) Gen 10 Project 14, and the Office of Naval 
Research, Information Systems Branch, N00014-67-A-01 11-0016. 


1. A set of basic building blocks whose properties 
are known is available. 

2. An instruction set or task assignment for the 
computer system is defined along with cost and 
performance constraints. 

3. Using his experience and intuition, the designer 
generates an ensemble of modules. These mod- 
ules form the system’s building blocks which the 
designer believes will perform the stated func- 
tions effectively. 

Given the above (1—3), the digital system must 
be describable to a design aid system. The designer 
then needs a language, its translator, and an operating 
system with the following properties: 

4. The set of functions to be performed can be 
described. 

5. The building blocks, their interconnection, and 
their place and function within the ensemble 
can be described. 

6. A computer program can generate a fabrication 
description of control modules capable of going 
through a sequence of states necessary to have 
the system perform the above functions. The 
designer may specify synchronous or asyn- 
chronous control systems. 

7. A simulator can accept the descriptions in (4) 
and (5), and the sequence description generated 
in (6), and produce measures of accuracy and 
performance. 

8. If the performance of the ensemble is “good”, 
the description of the computer system is in 
such form that it may be fed into a more de- 
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tailed design process. If not, the designer may 
alter his architecture. 

To satisfy the above needs, Digital Control Design 
Language (DCDL) has been implemented as part of 
design automation research being conducted at the 
University of California at Los Angeles. 2-6 A compiler 
for DCDL has been implemented for the SSD SIGMA 
7 using a META 5 compiler writing system. 6 * 7 The 
DCDL compiler is currently also being implemented 
for the IBM/360. 

The DCDL system illustrated in Figure 1 contains 
two compiler processors written in META 5, a pseudo 
machine (which is the subject of this paper) written in 
FORTRAN IV and the machine language, and two 
control implementation modules written in FORTRAN 
IV. The input processor is a DCDL syntactic analyzer; 

(1) this program translates the digital system descrip- 
tion (example in Part IV) into an interpretive code 
used by the pseudo-machine for simulation of the 
described hardware. The second META 5 processor 

(2) produces a numerical code which is then transformed 
into a binary control program and a fabrication de- 
scription of a control subsystem for the computer system 
being designed. The implementation specifications for 
the wiring of the control matrices are produced by the 
two FORTRAN IV programs (3,4). Control modules 
implied by microprograms have their wiring lists auto- 
matically generated by the Control Matrix Processors, 
in DCDS. The hardware construction of the control 
processor is then effected by using a set of one or more 
similar building blocks (Control Matrix Building 
Blocks), according to wiring specifications given by 
DCDS. 



Figure 1— DCDS system flow chart. 


The software module described in Part III is a pseudo 
machine (5) in charge of executing simulation runs. 
The pseudo machine is composed of a combination of 
FORTRAN IV and machine language subroutines. The 
simulation runs are designed to check test cases in 
order to assess the validity of a described design as well 
as to calculate its estimated execution time. 

DCDS is designed to analyze asynchronous as well 
as clocked systems, with the former posing a special 
problem: dynamic reevaluation of variables. Any time 
a logical variable is changed, the system must, as a 
consequence of this change, reevaluate any other 
variable which is a function of the changed variable. 
This process must continue until no further “conse- 
quential changes” occur. 

DCDS’s capability to dynamically reevaluate vari- 
ables allows the designer to describe his system using 
the same logical equations and timing relations which 
he uses to implement it. Programming in a form (see 
Part IV) which is highly related to the actual hardware 
provides for a system directly used by the designer 
eliminating the programmer as a “middle-man”. This 
direct correspondence also makes the DCDL program 
an up-to-date documentation of the system designed. 
The syntax analyzer accepts a description which images 
the hardware and translates this description into 
simulation code. Thus the designer is freed from the 
tedious job of programming the structure of the model 
required — a process sometimes more time-consuming 
than building a hardware prototype and testing it on 
the bench. 

The Digital Control Design Language (DCDL) 
is built as a cluster of three main sublanguages: a 
language intended for expressing Boolean equations 
and time relations; a microprogramming language; 
and an algorithmic language. DCDL uses FORTRAN 
as the algorithmic sublanguage. The user may choose 
any one of the three sublanguages to describe any of 
th,e parts or modules in the described design. The logical 
and microprogramming sublanguages use the same 
declarations and access the same variables by their 
names. The execution statements of sublanguages and 
their syntactic formats differ and one cannot combine 
statements of different sublanguages. Thus DCDS 
provides the user with a powerful means of expression, 
since he can select the most convenient and expressive 
form from among the three sublanguages to describe 
a hardware module. 

Entities and attributes in simulation systems 

For our observations herein, we consider the Simula- 








tion of a system to be the modeling and associated 
measurement of a system by a STRUCTURE in which 
EVENTS occur in TIME according to a set of RULES. 
Thus there are four sets of basic elements which must 
be dealt with in simulation: 

STRUCTURES, EVENTS, TIMES, and RULES 

Different simulation methods neglect one or more of 
these sets (e.g., time independent models). Any one of 
the four sets may be selected as primary entities and 
the others treated as attributes of that set. 

One may choose to consider an analytic closed form 
solution to be a simulation of a real system. In this 
case, the process of simulation becomes a transforma- 
tion. Assume for example the transfer equation for an 
electronic circuit. Both internal events (voltages and 
currents in the individual elements) and structure 
(topology of the circuit) may be neglected and one 
manipulates the set of rules (i.e. Kirchoff’s law and 
Ohm’s equations) to produce a transfer function which 
gives the output event§ as a function of time and input 
events . 

Thus whenever the rules are considered to be the 
main entities, then either an analytic transformation or 
an algorithmic procedure is used for simulation. The 
type and form of the information transferred into the 
simulation system as well as the simulation systems 
themselves vary from one another depending upon 
which of the four sets was chosen as the main set of 
entities. Due to these differences, different languages 
or input rules are used to describe the simulated system 
to the software package designed to perform the simu- 
lation. 

The following examples of different programming 
structures will serve to illustrate the previous discussion. 

Main Entities — EVENTS 

Examples of programming structures: 

SIMULA [8], GASP [9], SIMSCRIPT [10], [11], 
[12], [13], GPSS [14], 

A simulated system is described by an event flow 
chart. The programming systems above use input 
language formats suitable for the description of events 
in such a form. 

Main Entities — RULES 
Examples: 

NASAP [15], LISA [16], Boolean Analyzer [17]. 


DCDS Digital Simulating System 709 

r 

— ■■■-■ 

The input to circuit analysis programs like NASAP 
and LISA or to the Boolean Analyzer is in table-form 
which either explicitly gives the set of rules (Boolean 
equations) or gives a table that implies a unique set 
of rules (KirchofFs and Ohm’s equations for the cir- 
cuit). 

Main Entities— STRUCTURES 

Examples: 

LOGIK [18], Weather Simulation Program [19]. 

Partial Differential Equation Simulation [20]. 

The input format is any form suitable for describing 
the physical or hierarchical structure of the simulated 
system. 

Modeling and approximations 

After the selection of the entity and attribute rela- 
tions, the next step for simulating a system is to decide 
what can be approximated and how the selected ap- 
proximations can be done. The choice of what to ap- 
proximate can be categorized as: 

a. making certain entities (inputs) constants; for 
example t = 0 in time independent modeling. 

b. neglecting parts of the attributes; for example 
in simulation of partial differential equations by 
Monte Carlo methods, the field constants are 
calculated for only a small number of selected 
field points in the structure. 

c. modifying the set of rules; the use of difference 
equations to solve partial differential equation 
problems is an example of modifying the rules. 
For a different example of rule modification, 
consider a simulation program simulating an- 
other program on a digital system. The purpose 
of the simulated program is to execute a matrix 
inversion in which the inversion is performed 
on a 2X2 part of the matrix instead of the entire 
nXn array. In this case, the system rules may be 
modified to obtain fast simulation time for a 
simualtion that “ takes the system through the 
motions” without obtaining the actual nu- 
merical result. Thus for such approximations, 
one may simulate the system faster than real 
run time. 

Event directed simulation can be expected to be faster 
than structural simulation since structure simulation 
has to go through all possible events in the system, 
while event simulation takes the system only through 
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the prescribed events. This is, of course, also the main 
pitfall of event simulation; it does not point out events 
that might occur in the system but are unforeseen by 
the programmer. 

DCDS pseudo machine simulator 

A computer module in DCDL may be described by 
its structure (LOGIC), by the set of events that it con- 
trols (PROGRAM), or by the algorithmic rules 
(SIMULATE). In order to perform this task, the DCDS 
pseudo machine simulator operates as an algorithmic 
simulator by calling on the FORTRAN programs; as 
a structure simulator when simulating a logical struc- 
ture (operating from the Call Stack); or as an event 
simulator when processing a microprogram. The 
Program Stack (see Figure 2) operates the sequence 
of events generated by the control microprogram. The 
Call Stack operates all the logical details occurring in 
the logical structures forced by the control events. 

The DCDL event simulation is limited to operations 
within a logical structure. The evefits that are generated 
by the control as time moves forward, forces the simu- 
lator to follow all consequences of the events within 
the described logical structure. For example, the event 
simulator may directly order (by executing an in- 
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struction in the program stack) transfer of data to 
register A. All the other consequences of this action 
(i.e., all the outputs of gates whose input is A) are 
simulated from the Call Stack (structure simulation). 

The pseudo machine program 

A pseudo machine processor is a program written in 
machine language or higher level language for the 
machine on which one performs the simulation runs. 
In the present implementation on the SIGMA 7 this 
program is written using FORTRAN and assembly 
language. 

The process in which the translation is separated 
from the simulation allows one to write the translator 
program independently of the machine in use. The 
separation of the compiler program and the pseudo 
machine program allows independent debugging and 
changes in each. Modifications in DCDL and its 
compiler are done by changing the META 5 compiler 
program. FORTRAN changes in the pseudo machine 
provide for changes in simulation methods as well as 
insertion by the designer of other features expressed 
in FORTRAN to capture event information relevant to 
one design or another. 

Thus, by the process of programming in DCDL and 
by translation one obtains : 

A. Documentation of the design; 

B. A check on the consistency and completeness 
of all logical variables and all logical functions ; 

C. Automatic implementation of control sections; 

D. Simulation runs for given sets of input data; and 

E. The amount of time a certain run will take on 
the described design. 

Following is a discussion specifying the pseudo 
machine structure and operation codes. 

Instructions, interpretation, addressing, and 
indexing 

This unit contains the following parts (see Figure 2). 

(a) Time counter and time registers . 

The counter counts simulated execution 
time. The time registers are used to store 

time counts of different parallel branches. 
At a parallel junction, comparison between 
duration of operation on each branch is made 
and the highest time count will be the new 
value of the simulation time counter. 

(b) Indexing arithmetic unit. 

This unit is capable of fixed point operation 


Figure 2 — Pseudo machine structure 
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(plus, minus, multiplication, and division) 
and is used for indexing arithmetic. 

(c) Call-stack and Program-stack. 

Two push down (LIFO) stacks. One of the 
elements in the stacks is the operative ad- 
dress; i.e., the address of the instruction to 
be executed next. The operative address 
is usually the word at the top of the call- 
stack. If the call-stack is empty, the opera- 
tive address is the word at the top of the 
program-stack. 

A control branch to a lower (subordinate) 
control level (CALL) is instrumented by 
putting the first address of the lower control 
level program into the call stack, thus making 
the call address the operative address. When 
the lower control level is of type PROGRAM, 
the address is put in the program stack. 
The operative address is incremented by 
1 after an instruction is executed or the 
address is replaced by another due to the 
execution of a branch (a normal branch that 
occurs within the program being executed). 

An exit or return from the subordinate program will 
cause the stack to pop while a further entry into another 
subordinate program brings a new address into the call 
stack. The consequential calls are put into the call 
stack but their execution is delayed until all the parallel 
operations have been carried out and then all conse- 
quential calls are carried out. Two key words in DCDL 
indicate parallel structures. *GROUP indicates a set 
of similar modules operating in parallel and controlled 
by the same binary control variable (for example, 
a set of 32 single bit adder modules in a 32 bit binary 
adder). *PART indicates a set of dissimilar modules 
operating in parallel under the control of a single 
binary control variable (for example, shifter and counter 
in floating point normalization). A *PART may con- 
tain simple and nested ^GROUPS in which case the 
whole structure is operating simultaneoulsy under 
the supervision of a single control variable. The stacks 
have three points. TOPC (top of the call stack). TOPP 
(top of the program stack) and OPR (the operative 
address.) 

OPR = TOPP if call stack is empty 
OPR — TOPC if call stack is not empty 
OPR = TOPC at the time of entry to *GROUP 
or *PART if executing inside a *GROUP 
or a *PART. 


Consequential calls are intended for the dynamic 
reevaluation of variables. The STORE instruction 
invoking the consequential calls puts new addresses 
of variable reevaluation routines into the Call Stack. 
This is accomplished according to the following steps: 

1. The old and the new value of the variable are 
compared. 

2. The new variable value is stored. 

3. If the comparison mentioned above shows a 
difference between the old and new value, the 
address of the subroutine that calculated the 
new value of the dynamically dependent vari- 
able is put into the Call Stack. 

4. The address of the next instruction is the address 
on the top of the Call Stack. Thus, if there were 
any consequential calls, they would be executed 
prior to the completion of the execution of the 
subroutine that invoked those consequential 
calls. 

When there are no more changes in the values of the 
variables, the instructions proceed to the end of the 
reevaluation routine, which contains RETURN as 
the last instruction. The RETURN instruction pops 
the Call Stack sending the program to finish operations 
in the routine which invoked the consequential calls. 

The process of dynamic reevaluation will stop only 
if the variable values and the logical functions are 
consistent. Assume the following statements : 

A = A (B,C); 

D = v (A,E); 

B= -i D; 

with initial conditions A = 0, B = 1, C = 0, D = 0, 
E = 0. This set of relations and values is consistent. 
Now consider that the variable C is changed to one. The 
new set up of variables and relations is inconsistent 
and the reevaluation of variables will not reach a 
steady state. Each reevaluation will put a new address 
in the call-stack. 

A change in operation occurs once an address is 
put into location n in the stack. The pseudo machine 
prints an error message which is followed by the names 
and values of variables partaking in a STORE instruc- 
tion. This process continues allowing the program to 
put addresses in the next ten slots of the call stack. 
When the execution calls for storing an address at 
n+ Jl the call stack is cleared (TOPC = 0) and the 
operative address is taken as the instruction on top 
of the program stack. This debug feature allows the 
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program to check for logical inconsistencies without 
getting into an infinite loop or having to stop simula- 
lation runs. 

5. Delay table. The result of a logical transforma- 
tion specified in DCDL can be effected directly 
or after a specified time, for example in the 
statement 


operand is of size one bit. In this case, the one bit is 
extended to a word that contains all zeros or all ones 
of the size indicated by the SASR. A special instruction 
sets the size of the string accumulator (i.e., the content 
of SASR) thus setting the size of all following logical 
operations. 

Data Blocks 


A = ‘DELAY (3)' & (C, D, E): OP1; 

the transformation &(C, D, E) is performed if 
control variable OP1 is activated, but the con- 
tent of A will be changed only three time units 
later. 


Data blocks have different lengths and contain 
binary arrays. A binary array can possess up to three 
dimensions. Only a single bit or a binary word string 
can be addressed in the blocks. Each data block con- 
tains a two word header containing the variable name 
followed by the structure described below. 


To facilitate translation of the delay modifier, the 
pseudo machine contains a delay table. An entry into 
the delay table contains three parts: variable name 
variable’s new value, time of exit. 


Variable name 


Variable value 


Exit time 


Storage for a Single Bit 

The storage blcok for a single bit is one word (four 
bytes) plus a word for each consequential call. A 
consequential call occurs when a variable A is a dy- 
namic function of a variable B. B forms the input to 
the gate, the output of which is A. When B is changed, 
a consequential call causes the pseudo-machine to 
reevaluate the variable A. Thus, the storage location 
of variable E contains the addresses of sets of instruc- 
tions which will reevaluate all variables which are 
dynamically dependent on the variable B. 


Each time the time counter is incremented, all time 
of exit entries into the delay table are checked, and 
the entries with a time of exit matching the time counter 
activates a store operation storing the new value in 
the appropriate variable, invoking consequential calls 
if such are present. 

Logical manipulating accumulators 

The pseudo machine contains two string accumulat- 
ors, A and B. The machine performs the operations 
of AND, OR and EQUAL between the respective bits 
of the string accumulators and the result is stored in 
string accumulator A. The current size of both string 
accumulators is given by the content of String Ac- 
cumulator Size Register (SASR). 

All operations are performed on words of the same 
size. Calling an operand of the wrong size causes an 
error message printout and the machine goes to the 
next instruction. An exception to this occurs when the 


The single bit storage words format 


1 1 

2 

/ 3 / 

D 


i variable value 

' indicator flags 
number of consequential calls 

Byte 1: number of consequential calls 
invoked by a change in the 
stored binary variable. 

Byte 2: this byte contains indicators lor 
high bit position, number of 
dimensions of the logical vari- 
able, and variable type. Each 
indicator occupies two bits. 
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1 2 

X X 

5 6 

-j 

oo 


T 


variable dimension 

00 : bit variable 

01: one dimension array (word) 

10: two dimensional variable 

11: three dimensional variable 

variable type 

00: logical point, the variable does not 
contain memory 

01: 1 level storage, declared as *RS 

10: 2 level storage (clocked) 

position of the high order bit 

00: the high order bit is the most significant bit 
(leftmost bit) 

01: the high order bit is the least significant bit 
(rightmost bit) 

Byte 3 : not used 
Byte 4 : variable value 

The following words (if any) contain the consequential 
call address in byte 3 & 4 and its directive in byte 1. 


1 

2 

3 

BBB8 


cc directive 

cc address 


" 


" 


- 


variable 

storage 


consequential 

call 

address 


First word : Byte 1 : number of consequential calls 

Byte 2 : variable dimension, high order 
bit position and variable type 
(same as for bit storage) 

Byte 3 : lowest subscript of variable 
Byte 4: size of variable. 


The second word through the nth word 

( word size + K 

n = J contain the value of the binary 

32 ' 

word. If the variable is a clocked F/F, the amount of 
space for variable storage is doubled and each bit has 
two storage locations, primary and secondary. 

The last set of words contains consequential call 
addresses and their directives. 


Byte 1 : consequential call type (directives) 

Oil : calls on any change in the variable 

001: consequential call, only if the variable 
changes from 0 to 1 

010: consequential call, on the change of the 
variable from 1 to 0 

Ixx: consequential call of an entry to a PRO- 
GRAM, put a new address on top of 
program stack (operation on the last 2 bits 
same as above) . 

One dimension array storage 

In a one dimensional binary array storage, the first 
word contains the range and type of the stored variable. 
The following words contain the binary variable and 
then the consequential <jalls (if any). 


Two dimensional binary storage 


1 

2 

3 

4 

5 

6 

///// 

///// 


cc directive 

cc address 


it 


ti 


»t 


A two dimensional arrangement contains at least 3 
words. The first 2 words are used for bookkeeping in the 
same format as the 1 dimensional arrangement, with 
byte 5 indicating the lowest value of the second 
subscript, and byte 6 indicating the range of the second 
subscript. 
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Three dimension 


1 

2 

3 

4 

5 

6 

7 

8 


cc directive 

cc address 


ir 


M 


n 


In a three dimensional arrangement, byte 7 indicates 
the lowest value of the third variable and byte 8 
indicates the range. 

Arithmetic variable storage 

The third entity stored in pseudo memory is a block 
of 256 arithmetic variables used for indexing and ad- 
dress manipulations. 

Temporary logical variabls 

The memory contains a block of 256 one dimen- 
sional logical temporary variables, each one 128 bits 
long. 

Pseudo machine instruction set 

Most of the pseudo machine instructions closely 
resemble general purpose computer instruction lists. 
The main exception is that the addresses of logical 
variables contain the variable address as well as bit 
and word indices. 

In the following paragraphs we will discuss specific 
instructions which are unique to the DCDL pseudo 
machine and will give the reader more insight into 
DCDS simulating programs. 

A pseudo machine logic instruction is contained in 
a 64 bit word (eight bytes). 

As implemented on the SDS Sigma 7, the most 
common format of the pseudo-machine logic instruc- 
tion code contains 

a. operation code (one byte) 

b. operation code modifiers (one byte) 

c. operand address (two bytes) 


d. three address subscripts and a set of subscript 
tags. 

The actual operand address is a function of the 
main address (i.e., array address), the three subscripts, 
and the subscript tags. The main address corresponding 
to the name of the data block (i.e., the name of the 
variable). The subscript tags indicate whether the 
subscripts are to be used direotly, indirectly, or by 
word size. 

Each index byte has a two bit tag. The interpreta- 
tion of the tag is: 

If the tag is 00, this subscript is not currently ef- 
fective. For example, iir A(l, 3), A is a two di- 
mensional array and the third index is not used. 
If the tag is 01: The subscript is indicated di- 
rectly by the numerical content of the cor- 
responding subscript byte. 

If the tag is 10: The subscript is given directly; 
i.e., the corresponding number is the location 
of an indexing word in memory. 

If the tag is 11 : It is used for word variables and 
the word is the entire range of this subscript. 

The following section contains pseudo machine in- 
struction examples from the set of pseudo machine 
instructions. 

Store with invoked consequential calls 

STDC a): a <— A, Call Stack <— consq (a) 

If there is a difference between (a) and A, all the 
consequential call addresses associated with (a) 
are put into the call stack. To avoid redundant 
operation, a duplication of the address already 
inside the call stack will not be inserted; i.e., when 
two or more successive operations request the 
same consequential call this mechanism sets the 
operation such that the call will be executed 
only once. When the receiving variable (a) is a 
clocked element (two storage levels) both levels 
change to match the content of A. 

Store in secondary levd 

SSEC(a): (ai)^-A 

Stores into first level of a clocked storage element 
(a clocked element has two storage levels). Thii3 
instruction does not initiate consequential calls. 

Secondary to primary storage level 
transfer, entire array 

TRANS (a): (a 2 ) (ax) 
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Transfers the data from secondary to primary 
level in clocked memory elements. This instruction 
initiates consequential calls if consequential call 
addresses are present and the content of primary 
and secondary differ. 

Secondary to primary transfer, only 
designated bit(s) 

BTRANS (a): (a,) ( ftl ) 

Instruction execution same as above except 
transfer is performed only on bit(s) designated by 
the instruction. Note : consequential calls are not 
associated with single bits; a change in a variable 
invokes all consequential calls for the array. 

Delayed storage 

DELAY (a), i : DELAY TABLE <- a, i, A 

i, the delay count, is put in the second byte of the 
eight byte instruction (as a modifier). Delayed 
storage invokes consequential calls when they are 
associated with the stored variable. The conse- 
quential calls as well as storage will be activated 
after i time units. 

Instruction format 

ffj Q /_ _/_ _/_ _/ 

1 2 3 4 5 6 7 8 

Byte 2 : delay count 

Byte 3-8 : logical variable address 

Delayed secondary to primary transfer 

CKDLY (a), i: DELAY TABLE a, i 

This instruction stores the address and time count 
in the Delay Table. The variable value does not 
have to be stored in the Delay Table since it is 
stored in the secondary register of the variable. 

*PART entry point 

PARTIN: 

changes the GROUP flag to 1. As long as the 
GROUP flag is not equal to zero (GROUP 5 * 0) 
the operative address does not change due to the 
placement of an address in the call stack. 

*PART exit point 

PARTOUT: 

Turns the GROUP flag to “0” thereby releasing 


the consequential calls mechanism. Thus, if 
consequential calls have been involved, within 
PART this instruction causes the effective address 
to be the top of the stack and execution of con- 
sequential calls to begin. 

*GROUP entry point 

GRUPIN, Kl, XR: 

Loads the value Kli into the arithmetic variable 
serving as index register (XR). 

Increments the GROUP flag by one (GROUP = 
GROUP + 1). 

Format 

E 2'X X'X X'- -'X X'X X 7 - 
1 2 3 4 5 6 7 8 

Byte 4: arithmetic variable serving as index 
register (XR). 

7&8: number (Kl) loaded into the index 
register (XR). 

*GROUP exit point 

GROUP, K2,i,n, XR: 

(1) Compares K2 with the value stored in the 
appropriate index register (XR). 

If the values are equal: 

Decrements the GROUP flag (GROUP = 
GROUP — 1) and proceeds with the execution of 
next instruction. Note that if GROUP flag is 
decremented to zero (GROUP = 0) the stack 
pointer is moved to the highest occupied position 
POINT-TOP and stored consequential calls are 
executed. 

If the values are not equal : 

The index register variable XR is changed by 1 
or by —1. 

The operative address (next instruction address) 
is changed to the value n. 

'E 3'- -'X X'- 
1 2 3 4 5 6 7 8 

Byte 2: (i) Incrementing or decrementing value 

(1 or —1) 

4 (XR) Address of index register 
5&6: (n) Label of the instruction at the 

top of the *GROUP loop 
7&8: (K2) upper limit of index register. 
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The operative address cannot: change as long as 
execution is within a *PART or *GROUP 
(GROUP 7^ 0). The consequential calls will be 
stored in the call stack and evaluated one the 
program exists all the nesting of *GROUP and 
*PART. 

Unconditional branch 

GOTOn: 

Unconditional branch to n: the value n replaces 
the operative address. 

Conditional branch 

GOTO (k)n: 

Branch is taken if the logical accumulator A = 0 
and k = 0 or A ^ 0 and k = i. When the branch 
is taken, n replaces the operative address in the 
CALL or PROGRAM Stack. 

Call 

CALL n : 

Control transfer. The label n is put on top of the 
call stack making it the new current operative 
address. 

Return from a substructure 

RETURN: 

The instruction causes the call stack to pop making 
the next label in the stack the operative address. 

Call microprogram controller 

CALP(n): 

puts (n) on top of the program stack 

Return from a microprogram 

RETRNP: 

Pop the program stack 

Check bit 

CHECK (a) 

The instruction contains a bit indicator (byte 2). 
The bit indicator is compared with a bit in memory 
addressed by bytes three-eight. If the bits are the 
same, the result is no operation; if the bits are 
different, the instruction executes a RETURN. 


Count time 

TIME, n: (Timer)'— (Timer) + n, Evaluate de- 
lay table. 

Counts n time units; note that with each count 
the delay table will be reevaluated and the in- 
struction will activate delayed storage. 

Store timer 

TIMS (n) : (n)<- (Timer) 

Stores the content of the timer in n 

Return to time count routine 

TRET: 

This instruction pops the call stack then returns 
control to the timer control subroutine. 

Bring timer 

TIMI(n): (Timer) <- (n) 

Sets the timer according to the value stored in n. 
Set timer 

TIMO n, m, k: (Timer) <— n, (timer subroutine) 
<— m, k. 

The instruction contains a new initial value for 
the timer. 

Gather point for parallel branches in a 
microprogram 

GATHER (b), j, k: 

This instruction appears at the gather point of 
parallel operation. The instruction contains two 
numbers, j and k, each stored in a two byte loca- 
tion and used for parallel branch count, k contains 
the total number of parallel branches coming in 
to the gather point; j contains the number oi: 
branches not yet executed. The arithmetic varia- 
ble b is used to store the maximum operation time 
on the parallel branches. 

operation: if j ^ 0 

a. j <- j - 1 

b. (b) MAX ((b), (timer)) 

c. Pop the call stack 

if j = 0 

a. j k 

b. (timer) <— MAX ((b), (timer)) 
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c. (b) 0 

d. go to next instruction (past parallel gather) 

'D 4'X X'X X'- 

1 2 3 4 5 6 7 8 

Byte 4: Arithmetic variable storing time count 
5&6: value of k, total number of parallel 
paths 

7&8: value of j, number of parallel paths 
to be executed 

Logical to numerical variable transfer, 
first word 

SINI (n), (v) : B (v), (n) 4- B(0-31) 

The content of the logical variable v is loaded 
into B accumulator. When the rightmost bits of 
B(0-31) are loaded into the arithmetic variable 
n. This arithmetic variable is to be transferred into 
the simulated section. If the size of B is less than 
32, zeros will be put into the leftmost bits of the 
word. 


Numerical to logical variable transfer, 
additional words 

SOUT2 (n), k: B(32*k to 31+32*k)<~n. 

Loads the content of (n) into the kth word of B. 
This instruction must be followed by SOUT1 or 
another SOUT2. 

Call simulation section 

CALSIM, n: B<— 0, CALL simulation section. 

Resets B, then activates the FORTRAN or ma- 
chine language simulation section, n is the number 
of the subroutine called. 

Error trap 

TRAP: 

This instruction must follow a conditional branch. 
The execution of the instruction consists of print- 
ing an error message and then following the branch 
of the previous instruction, even though the branch 
conditions were NOT satisfied. 


Logical to numerical variable transfer, 
additional words 

SIN2 (n), k: n<-B(32*kto31+32*k) 

This instruction must follow SINI or another 
SIN2 instruction. The instruction transfers the 
kth word from B to the arithmetic variable n 
to be transferred into the simulation section. 
Format 


The logic design of a serial adder 

Figure 3 gives the block diagram of a design specifi- 
cation for a serial adder. The adder contains two clocked 
shift registers, A and B, containing 16 bits each. Other 
parts of the adder are a four bit counter COUNT, a 
carry flip flop C, a single bit sum and carry logic, the 
adder controller AUC, and a PANEL section. 

The sum of A and B generated by the adder replaces 
the content of B. A is connected to perform a cyclic 
shift such that at the conclusion of the addition it 


'5 1" - -'X X'- -'X X'X X'X X' XX 7 
1 2 3 4 5 6 7 8 

Byte 2: contains the address of the arithmetic 
variable 

4 : k, position of the word in B . 

Numerical to logical variable transfer, 
first word 

SOUTl (n), (v) : B<-n, (v)<-B, B^O 

This instruction transfers the bits of an arithmetic 
word n into the rightmost 32 bits of B, then stores 
the content of B in v, and then resets B (the in- 
struction may invoke consequential calls if they 
are associated with v). Byte 2 contains the 
arithmetic variable address. 



Figure 3 — Serial adder 
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contains its initial value. The sum bit generated at each 
cycle is stored in position B(16). 

Design Example, Serial Adder 

Figure 4 contains a DCDL program specifying the 
serial adder. The program starts by declaring a UNIT 
named ADDER at control level #1. The declaration 
section starting with the key word *DECLARE 
specifies that the UNIT ADDER receives three control 
signals (ORDERs) from its supervisor (s). The OR- 
DERS are <A + B>, CNT and RESET. The func- 
tions controlled by these ORDERs will be specified 
later in the LOGIC part of this UNIT. 

Other parts declared in this DECLARE section are 
the 16 bit register A, the 16 bit register B, and the 
flip flop C. A, B, and C are composed of clocked RS 
flip flops (type *CRS). The next declaration is a four 
bit register COUNT constructed from TRIGGER flip 

flops and a DATA BUS logic variable TEST. The 

value of the variable TEST will be specified in the 
LOGIC part as a logical function of memory elements. 


• UkjIT ADDER, level* 1 
•DECLARE 

♦ORDER <A+R>,CNT, RESET l 

♦CRS A ( 1 6 • 1 ) * ft ( 16 8 1 ) , C J 
♦TRIG 3 ER CO JNT< 481 ) J 
♦DATA BUS T-ST . J 

• END 
•LOGIC 

•part: cm , 

count< 1 )%«'xi* i 

C 9 uNT( 2 )^.C»uNT( 1 ) j 

COUNT ( 3 ) ?{■$ ( COUNT { 1 ) # COUNT { 2 ) ) j 

C 6 UNT(A)v.» UC 0 JNT{ 1 ),C?UNT( 2 >#C*UNT< 3 ) ) i 
•END 

•part: reset * 
c«»xo* ; 

counT(*)» , x:' j 

•end 

*part:<a+b>, 

a ( * > V» ’CYCLE ( • 1 ) 1 A ( * ) I 

3(16)Xa|(&(A(l)/B(l)/C># 5(-A( 1 W9< 1 )#C)/ 

5(-Ad)/B< Ui-Oi I 

♦GROUP I. I,t5 *$£T , 
ft(I)%»B(I*n l 

•END 

C\. I (&( A< 1),B( 1) )#&( A(l),C)# S( B ( 1 );, C ) > J 
•END 

TTST • S(C0 jNT(l),CeuNT{2),C0JNT(3)!,CBUNT(A)) j 

• END 

•END ADDFR 

•unit ajc, level-2 

•declare 

♦ORDER ADD i 

•RF p LY FIN ; 

•end 

*PRf>GRA?1 


add: reset: ai i 

ai :<A4?>:ap ; 

A2 :<A+o>,rNT : A3 j 

*3! *G9-T0 TEST: «A4,A ? ) j 

aa: •peuiRj fin ; 

•END ♦END AUC 
•PANEL AAA, LEVEL.3 

•SYSTE^RE'eT : A (•)« *X7P37», B ( • ) ■!* XHE CD ' J *TI v L«0; 
♦ AT_T I INTERVAL « P, WRITE A { ♦ ) : x,i =t ( * ) 8 X/ CGUMj*) 
♦ START ADD j •FINISH FJnj, •rND AAA 


!X I 


Figure 4 — Serial adder DCDL program 


The declaration section ends with the key word *ENI). 

The logical and control relations in the ADDER 
UNIT are specified in the LOGIC section which starts 
with the key word *LOGIC. The LOGIC section con- 
tains three PART sections and one direct transfer 
statement. 

The first PART section is controlled by the OR- 
DER VARIABLE CNT. This section contains the 

input statments to the four COUNT flip flops. The 
statements specify that the input to COUNT (1) is 
a “ONE” (‘XT specifies a one in a hexadecimal format). 
The input to COUNT (2) is the output of COUNT (1). 
Similarly the input to COUNT (3) is the AND of 
(COUNT (I), COUNT (2)) and the input to COUNT 
(4) is the AND of (COUNT (1), COUNT (2), COUNT 
( 3 ))- 

The first PART section is controlled by CNT 
clocked transfers (%=) which are associated with the 
clocked input of the registers' flip flops. The next 
PART section controlled by the ORDER RESET 
specifies a direct connection ( = ) into the clocked vari- 
ables C and COUNT. Therefore, the PART controlled 
by CNT changes the clocked input of the COUNT 
register. The PART controlled by RESET changes 
the content of COUNT and C using direct set (DC set) 
and direct reset (DC reset). 

The last PART section is controlled by the ORDER 
VARIABLE <A + B>. Activated by the <A + B> 
control variable are the following transformations: 

a. The content of A is shifted a cyclic shift by one 
to the right, the result is stored in A(*) ; 

b. B(16) receives the sum function of A(l), B(l), 
andC; 

c. The GROUP of bits B(l) to B(15) are shifted 
by one to the right; 

d. The carry flip flop C receives the carry which hi 
a function of A(l), B(l) and C. 

Note the PARTs containing a clocked transfer refer 
to double rank clocked elements. Whenever the con- 
trolling variable is activated, the specified function 
(to the right of % — ) is stored in the secondary rank 
of the variable to the left of % = . In the succeeding time 
unit, a primary secondary transfer is activated. 

The last statement in LOGIC is a dynamic specifi- 
cation of the variable TEST as an AND function of the 
bits of COUNT. 

The next UNIT to be specified is the adder controller, 
AUC. AUC introduces two new variables in its declara- 
tion section: an ORDER ADD which it receives 
from its supervisors, and a reply FIN which it sends 
back to the supervisors. 
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The control function of AUC is specified by a micro- 
program in the PROGRAM section of AUC. The in- 
terpretation of the microprogram is as follows: 

a. When a controller receives the ORDER ADD, 
it issues the ORDER RESET. After the de- 
fault time lapse, two time units, the controller 
switches to state A1 . 

b. In state Al, the controller issues the ORDER 
<A + B>. After two time units, the controller 
moves to A2; 

c. At state A2 the controller issues two ORDERS 
<A + R> and CNT. The next state is A3; 

d. A3 is a conditional branch. If TEST is “ONE”, 
the next state is A4. If TEST is “ZERO”, the 

next state is A2. The GO TO line is an internal 

control branch specification which does not 
require any additional cycle. Therefore the 
execution time of this line is zero time units; 

e. The last microprogram line states that when 
the controller is in state A4 it issues the REPLY 
pulse FIN, and returns to its zero state. 

The highest controller in the structure is AAA 
PANEL at level 3. The PANEL specifies the system’s 
initial conditions (placing initial values in A and B) 
using the SYSTEM RESET statement. The initial 
condition for the timer is specified by the statement 
*TIME = 0. The key word *START indicates the 
initiating variable, and the key word *FINISH is 
followed by the variable signaling completion. The 
last statement in PANEL is *END followed by 
PANEL’S label AAA. 

More Complex Structures 

The above description has illustrated the use of 
DCDL to design a simple adder. The language and 
system have been used to design more complex struc- 
tures including a multiplier and special purpose logic 
card tester. 1 

CONCLUSION 

The scope of the DCDS study was limited to systems 
for which a set of predefined building blocks and a de- 
fined structure are present. A total design automation 
system requires programming tools capable of studying, 
simulating, and gathering statistics and thereby able 
to evaluate conjectures about the behavior of struc- 
tures and sequences of events before the details of 
the structures and events are known. We hope that 
further extension of DCDS and further study in sing- 
ulation and modeling will add the capability to make 


conjectures based on systems less rigorously defined 
than DCDS presently requires them to be. 

The DCDL implementation by sublanguages which 
are compiled by META5 allows a simple insertion of 
other sublanguages designed to study the architectures 
of systems. The DCDL pseudo machine operates as 
a FORTRAN based simulator either to describe the 
simulated system or to augment the pseudo machine 
instruction set. 
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INTRODUCTION 

There are many ways in which a pattern recognition 
system may be implemented. In the specific problem of 
speaker verification , 1 * 18 a two-class recognition scheme 
is of interest. A speaker who desired verification of 
his identity based upon some previously stored charac- 
teristics of his speech represents one of the two classes 
(real), whereas the other class (impostor) encompasses 
all other speakers. 

In implementing such a system, it is convenient, 
first, to obtain a representation for each of the utter- 
ances of interest in the form of a time-frequency- am- 
plitude matrix . 2 - 3 The conventional method of deriving 
this representation is by means of a filter-bank ana- 
lyzer . 2 - 3 Speech signals are inputted to the analyzer 
and the outputs of the various filters are sampled and 
averaged over the appropriate time interval. This proc- 
ess generates a set of short-term average spectra with 
which to form the time-frequency-amplitude matrix. 

Normally, only those components of this matrix 
which contain significant speaker characteristics need 
be retained. Identification of such speaker-dependent 
components is somewhat arbitrary although several 
guide lines are available . 2 - 3 

The next step is to regard all the pertinent elements 
of the above-mentioned matrix as constituting a single 
vector. Thus, the net result of the previous processing 
steps is a vector representation for each utterance. 

At this stage, several mathematical and statistical 
tools may be applied appropriately to the data. For 
example, the vector representation of an utterance may 


exbihit high dimensionality. For further computational 
advantage, it is desirable to reduce this dimensionality 
of the vector. It is also helpful to achieve as much 
intra-class clustering and inter-class separation as 
possible. Methods such as analysis of variance , 4 dis- 
criminant analysis 6 and mutual information calcula- 
tion 10 are available for this purpose. The analysis of 
variance and mutual information methods can be 
conveniently used even if the initial dimensionality of 
the vectors is rather high. The disadvantage of these 
two methods is that each element of the vectors is 
considered independent of the other elements; this is 
not desirable since the interrelationships between the 
elements which may be important for the purpose of 
speaker verification are completely ignored. On the 
other hand, while discriminant analysis treats the 
vectors in multi-dimensional space, thereby preserving 
the interrelationships, the computation time required 
may be impractical if the vectors are initially of inap- 
propriately high dimensionality. 

Finally, a method for discriminating among the 
vectors of the real class and of the impostor class is 
required. This is usually done by means of a reference 
vector. There are again several alternatives here. For 
example, it has been pointed out that if a suitable 
representation for the impostor class is not available, 
it is possible to derive a reference vector based on the 
real class data only . 6 But, if the impostor class is 
properly characterized > Adaline-type linear threshold 
elements , 7 which attempt explicit discrimination among 
the real and the impostor classes, may be used to ad- 
vantage. 
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There are many other methods for feature selection 
and reference vector generation. 7 ’ 8 Each method has 
its particular advantages and shortcomings. In ad- 
dressing the speaker verification problem, it is con- 
venient to use the analysis of variance technique for 
feature selection and a modified form of the Adaline- 
type linear threshold device for deriving a reference 
vector. Previous unreported in-house experimentation 
has indicated that the two techniques, analysis of 
variance and mutual information calculation, produce 
rather similar sets of features. The rationale for using 
the modified form of the Adeline device will be pre- 
sented in the next section. 

Another important aspect of the pattern recognition 
problem is to conduct a significant experiment with 
true data. Too often, for reasons of economy and time 
limitations, artificially generated data or a very limited 
quantity of true data .is used to perform experiments. 
Experience has shown that conclusions based on such 
experimentation are often misleading. A primary 
achievement of the present experiments is the use of 
a large true data base. The value of this large data 
base will be further appreciated in the following sections. 

The next section outlines the modification of the 
Adaline-type linear threshold element. The analysis 
of variance technique, being a standard tool in statistics, 
is not treated here. The details of the experimental 
part are listed in the third section. Finally, some con- 
clusions and observations regarding the whole pro- 
cedure are made. 

Theory 

At first, in this section, a brief description of the 
classical Adeline 7 procedure will be given and some 
of its shortcomings will be pointed out. Next, a modified 
form of the above procedure will be developed. In the 
standard Adaline technique, 7 a reference (or weight) 
vector W is derived by utilizing vectors from both 
of the two classes to be recognized, C and C. The 
vectors in the two classes are assumed to be linearly 
separable. 7 For convenience in describing the tech- 
nique, the negative of the vectors belonging to C are 
assigned to C. Next, denote the vectors that are now 
attributed to C as Fi, ..., Y m , where m is the total 
number of vectors. The Adaline procedure 7 is an itera- 
tive method of determining a weight vector, IF, such 
that ^ 

YyW > 0 j = 1, 2,.--,m (1) 

where the operator ( • ) signifies the inner product oper- 
ation of two vectors. The iteration process is described 
by the rule 


fF*+i = W k if Yh'Wk > 0 

= W k + Y k otherwise. 

Using this weight vector IF, a new test vector may be 
classified to C or C depending on whether the inner 
product of the test vector with IF is greater than zero 
or not. The drawback of this procedure of decision 
making is that the test pattern vectors belonging to 
C which would normally produce slightly positive 
inner products may, in the presence of some noise, 
lead to negative inner products and be misclassified. 
Similar statements may be made about the patterns 
belonging to C • 

In order to avoid this difficulty, a weight vector IF' 
which satisfies the inequality 

Yj-W' > K|F i |' y |lF , | a 

j = 1, 2,- • •, m (2) 

0 < K < oo,0<7< °o,0<a< oo 

may be tentatively proposed for classification. Clearly, 
the advantage of this inequality is that for non-zero 
K, a dead-zone is created. This zone is symmetric 
about zero and equal in magnitude to the right-hand 
side of equation (2). The dead-zone may be designated 
as an interval of no decision . As a result, some tolerance 
to noise is provided. A noisy test pattern vector which 
would otherwise satisfy equation (2) may lead to an 
inner product lying in the dead-zone, but is unlikely to 
to be misclassified. 

However, only some special cases of equation (2) 
will really concern us. The cases which will not be of 
interest are 

Y j-W' > K|TP|“ 

j = (3) 

0 < K < 00 , 0 < a < oo ) a 9 ^ 1 

The reason they are not of interest will now be given. 
It will be demostrated that a IF' satisfying equation 
(3) may be derived from a W satisfying equation (1) by 
a simple change in magnitude; thus since IF and W* 
would differ only in magnitude and not in orientation, 
the classification ability of IF' would be identical to that 
of IF. (It should be pointed out that if an actual itera- 
tion process is carried out to arrive at a weight vector 
IF for equation (1) and a weight vector IF' for equation 
(3), K?^0, the weight vectors are likely to be oriented 
differently in the multi-dimensional space and would 
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thus lead to different generalizations. The important 
point to note is the possibility that W' may be oriented 
in the same direction as one of the possible W vectors.) 

Assume that a weight vector W has been found by 
some means which satisfies equation (1). Denote the 
minimum of the inner product values indicated in the 
eft-hand side of equation (1) by 5, i.e., 

Min Y j'W = 8 > 0 

3 

The postulate will be shown to be true by deriving 
a scalar constant S>0 such that the weight vector 


Y*'W' > K\W'\ j = 1, 2, • • • , m (4) 

0 < K < 00 

or, the general equation (2) with 7^0, 

Y r W' > K|F,| 7 |TFi a 

j = 1, 2,- • •, m (5) 

0 <K< 00 , 0 < 7 < 00 , 0 < » < 00 

Equation (4) has another interpretation. If this equa- 
tion is written as 


W' = SW 


will satisfy equation (3). Since the above operation on 
W changes its magnitude only and not its direction, the 
minimum of the inner products Yj-W* still occurs at 
the same value of j (more than one value of j may 
produce the minimum value, but this fact is of no 
concern to the present development). Let this value 
of j be designated as j '. Thus, 

Y/-W = 5 

Then multiplying both sides of the above equation by 
S yield 


Y/-8W = S S 
or 

Y/-W' = S5 

In order to achieve equation (3), it is necessary to find 
an S such that 

SS = K|TF'|“ 

= KS“|TF| B 

Rewriting the above equation, 

s = ^EiEl! y~ a 

which is the suitable value for satisfying equation (3). 
Note that S cannot be determined for a = 1. 

If among all possible cases of equation (2) the cases 
given in equation (3) are not considered, the equation 
of interest is either an inequality in the form of equation 
(3) with a = 1,7 = 0. 


Yr 


W' 

\W’\ 


> K 


j = 1, 2,- ■ •, m 
0 < K < co 


it is apparent that the inner products of the weight 
vector, normalized to unity, and the vectors F are 
computed in this algorithm. This implies that the only 
way the weight vector can affect the value of the inner 
products is by changing its direction in space and not 
by changing its magnitude. This fact is of considerable 
interest since it has already been demonstrated how a 
simple change in magnitude of a weight vector satis- 
fying the simple equation (1) can make the new weight 
vector satisfy the more complex equation (3), even 
though the generalization property remains unchanged. 

The use of equation (4) is advocated in this paper. 
In the following paragraphs, however, an approach to 
the more general equation (2) will be considered first. 
Substitution of suitable values for the different param- 
eters of equation (2) will then realize the results for 
equation (4). It will be found that a bound for the 
convergence rate may not always be obtained for the 
approach adopted in this paper. 

The procedure parallels the proof for the standard 
Adaline technique. 7 The iteration method is defined by 
the equations 

W'j+i = W' j if Yj-W’j > K|ry|>|TF'/|“ 

= W’ j + (ji/\Yj\P)Yj otherwise; 

0<M< °o,O<0< 00 


where the constants /* and 13 have been incorporated 
for further generality. 

Following standard convention, a reduced training 
sequence Y ly F 2 ,..., F*,... and a reduced weight vector 
sequence #1, W k> ... are formed 7 and the follow- 

ing discussion is based on these sequences. The initial 
gain vector is assumed to be zero, TFi = 0. Then, 
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w k+1 = m e 

i— 1 


A 


Yi 

w 


The above inequality for j = 1, 2, • • • , k may be added 
(6) to obtain 


Taking the dot product of the solution vector W' 
(which is unknown, but is assumed to exist) with both 
sides of the above equation yields 

ys k Y-W 

From equation (2) 

Yi W f > 0 
Let 

Min Y i-W' 

1 = 1, 2,- • •, k |^|0 = A m 

Then, 

W k+1 -W' > k/*A m 


But, the Cauchy-Schwarz inequality states that 


W k+1 \ 2 > 


( Wk+vWy 

W'\* 


Therefore, 


\W k+1 \* > 


k 2 u 2 A 2 m 
tF‘| 2 


(7) 


This is a lower bound for the magnitude of W k + 1- 
Another line of reasoning will give an upper bound 
as well. 

Since 


A 



W k+ ,| 2 < M 2 E 

7-1 


f 3 | 2 [^] 


+ 2 m K Z \Yi\v-e\Wi\ 01 (8) 

7-2 


From equation (6), 

A 

^ Yi 

Hence, 

\W,\ < M E \Y<\ ** 
»=1 


Let 


Max 

b m = i - 1, 2, •••,]- 1 |y,|[a-*>] 

Then, 

\w,\ < ltd - 1) B„ 

Also, let 

Max A 

= j = 2, 3,* • *, k | Y & 

Therefore, the last term in the right-hand side of 
equation (8) may be written as 

2„K E \Yj\ y - p \ W j\ a < 2 M K E - 1)"B % 

7-2 ? =2 

Thus, equation (8) leads to 


i t follows that 

A .A A, ( Y * * W •) 

\W i+ i\* = | IF, | 2 + M 2 |Fi| 2(I "^ + 2 M 


Therefore, using F,-TF, < K|F/y|l^,|“- 



I IF, | 2 < + 


2 M K|F,|y TF, “ 

|F,f 


|TF*+i| 2 < M s kBj, + 2m 1+ “ KB^ Ci, E (i - D £ 


Since, 

k k — 1 k 1+a 

E (i - i )“ = E f < / (i + i)“ dj == 

7-2 7=1 


it follows that 
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< + 


1 + a 


fj}+ a KB^Cj/k 1+a (9) 


vrhich gives the upper bound on the magnitude of 
W k+\- Combining equations (7) and (9), 


k 2 M 2 A 2 m 
ITF'I 2 



< M 2 kB 2 M 


+ M^KBJCjfk^ (10) 

1 1 -a 


which must be satisfied if a solution vector exists. 

At this point, it is necessary to substitute suitable 
values of the various parameters to gain further insight 
For example, if the parameters are set as 

p « 0, m = 1, and K = 0 

the standard bounds described in the literature 7 for 
conventional Adaline-type devices are obtained. For 
the experiments reported in this paper, the parameters 
are 



P = 0, n — 1, K > 0, y = 0, a = 1 

Substitution of these parameter values in equation (10) 
leads to 

f^ 2 , < \W k+ i\* < kB 2 w + KBjtfk 2 (11) 

where 

Min ^ 

Am = j = 1, 2, .... , k YyW' 

Max ^ 

and B M = j = 1, 2, .... , k — 1 | Y,\ 

Thus, the above inequality leads to 

kpj^iCB^ + kKB^ 

The left- and right-hand sides of this inequality have 
been plotted in Figure 1. It is clear when the two 
straight lines intersect, as in case (1), 


> KB m , 

a definite upper bound on the number of steps k = k M 
exists, provided, of course, that the solution vector 
exists. On the other hand, for cases (2) and (3) where 

j-jyTp ^ KB m > 

no such upper bound exists and convergence of the 
procedure is not guaranteed. 

As in standard literature, 7 the bound is not useful 
in estimating how many steps will be required in a 
given situation, since it depends on the knowledge of 
a solution vector W ' . 

It has been shown that the algorithm in equation 
(4) it desirable because it forces the gain vector to 
change its direction in space; a simple change in magni- 
tude cannot help in satisfying the inequality of equa- 
tion (4). At the same time, the possibility of obtaining 
a section in a finite number of steps exists. In the 
next solution, this algorithm is the basis for some 
experiments with real data. 
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Experimental results 

The nature of speaker verification allows one to 
perform experiments which are fairly well controlled. 
Since most speaker verification applications provide 
cooperative users — individuals desiring verification — 
it is possible to require each user to utter a particular 
phrase. The phrase can be designed to carry a maximum 
of speaker-dependent information. The choice for 
the experiments being reported here was “Check 
Available Terminals.” Each speaker included in the 
test was asked to utter this and four other such phrases 
in a predefined but randomized order, interspersing 
each utterance with an utterance-labeling task to 
prevent interaction between adjacent phrases. Re- 
cordings of these utterances were made in an acoustical- 
ly treated room using a wide-band recording system. 
A boom-mounted microphone and headset combination 
assured constant microphone placement. Each subject 
was asked to speak in a normal voice and a level ad- 
justment made to provide approximately the right 
input signal level to the tape recorder. It was felt that 
these rather idealized conditions would allow evalua- 
tion of optimum verification performance. In addition, 
in certain applications the real data may approach this 
idealized high quality. 

TABLE I — Filter bank specifications 


Filter 

Center 

Frequency 

± db 

7 umber 

Hz. 

Bandwidth 

1 

188 

250 

2 

459 

250 

3 

715 

250 

4 

969 

250 

5 

1220 

250 

6 

1472 

250 

7 

1725 

250 

8 

1975 

250 

9 

2225 

250 

10 

2475 

250 

11 

2725 

250 

12 

2991 

290 

13 

3300 

330 

14 

3659 

390 

15 

4083 

460 

16 

4586 

550 

17 

5194 

670 

18 

5954 

860 

19 

6932 

1110 

20 

8203 

1450 


In total, utterances from 118 male speakers were 
used. Fifty of these were arbitrarily assigned as “reals” 
and 100 utterances of each phrase were collected from 
each speaker over about a five-to-ten-week period. 
Each of the other speakers was assigned to the “im- 
postor” class; each uttered each of the five phrases 20 
times, all at one time. 

The analog recordings were digitized using the hard- 
ware shown in Figure 2 and Table I. It consisted pri- 
marily of 20 bandpass filters covering the range of 
center frequencies 188 Hz to eight kHz. The lower 
frequency filters had 250 Hz. band widths while the 
higher frequency filters were somewhat broader, A 
20-ms. sampling interval was employed. The output 
of each filter was rectified and integrated over each 
sampling interval. The value of this integral was con- 
verted logarithmically into a four-bit value spanning 
a32-dR. range. Only two other pieces of hardware were 
used — an automatic level control (ALC) and a fun- 
damental frequency detector. The former maximized 
use of the full dynamic range of the A/D conversion 
system. Further, to allow reconstruction of the original 
absolute signal level, the value of the gain of this ALC 
circuit was digitzied for each sampling interval. The 
fundamental frequency (pitch) detector also passed a 
digital estimate of the pitch period to the computer 
for each sampling interval. Otherwise, this pitch in- 
formation would have been unavailable because of 
the width of the bandpass filters. Smith has described 
the pitch determination method used. 16 

The remaining experimental steps were executed 
through programming. It was felt that implementing 
most of the system by software and using a general- 
purpose hardware analyzer would maximize the flexi- 
bility of the system. Even greater flexibility could be 
obtained by simply sampling the analog speech wave- 
form and storing digitized samples, but the quantity 
of the data to be processed would be prohibitive. In 



Figure 2 — Functions of analyzing hardware 
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total, approximately 13 hours of analog recordings of 
the phrase, “Check Available Terminals,” were proc- 
essed. 

The first step of utterance-processing, segmentation, 
was a speech-recognition process which would operate 
with good reliability over a large population of speakers 
because the phrase to be recognized was known. This 
step automatically eliminated improperly spoken or 
digitized utterances. It also served a time-alignment 
function, allowing comparison of like sounds from ut- 
terance to utterance and from speaker to speaker. 

Ten points in time were found for each utterance. 
Each segmentation point was defined by a precise 
set of acoustic rules which will not be given here. The 
points were given the following symbols which cor- 
respond roughly to the standard orthography of the 
words * 

checA aVa/LaRle TERMmalSi S 2 
(S| = onset of S: 

& = end of S.) 

The segmentation rules were determined by the fol- 
lowing iterative process. A group of ten speakers was 
selected arbitrarily and programs were designed to 
segment their utterances properly. Accuracy of seg- 
mentation was verified by studying digital spectrogram 
patterns of each of the utterances. Once designed, the 
rules were tested on another arbitrary set of ten speak- 
ers. The accuracy of segmentation was improved by 
accounting for factors manifested in these new speakers. 
Once the rules seemed sufficiently accurate, in terms 
of testing on new utterances by this combined set of 
20 speakers, these rules were evaluated on an inde- 
pendent group of 20 speakers. Performance appeared 
consistent; that is, no significant segmentation prob- 
lems were apparent in this new set of speakers and the 
segmentation programs were considered complete. 
Space does not permit detailed description of the final 
segmentation rules. Roughly speaking, they involved 
the following functions: voicing detection, frication 
detection, total signal power, and second formant fre- 
quency. Consideration of these functions and the known 
context of a fixed phrase permitted quite accurate 
segmentation over a broad population of speakers. 

As a preliminary rule, all utterances were required 
to have defined locations for all ten segmentation points. 
This restriction resulted in ten percent of the phrases 
being rejected. Phrase rejection which implies no de- 
cision by the machine as to speaker identity should 
be contrasted with speaker rejection. Most applications 


would be less sensitive to unnecessary phrase rejection 
than speaker misclassification. Furthermore, the phrase- 
rejection rate could be reduced substantially if later 
stages of recognition wore designed to operate on a 
partially segmented utterance. 

The next phase, feature extraction, used a segmented 
utterance for input and produced a vector of features. 
For this set of experiments, determining the features 
was a two-step process. First, a large set of “proposed” 
features was selected. This choice was based upon pre- 
vious research by the authors and their colleagues, as 
well as on published results of experiments involving 
human and automatic speaker identification. 1 ’ 4 ’ 8 - 9 - 11 
Second, the list of features was shortened for economy 
of implementation. The “goodness” criterion used to 
determine whether or not to include a particular feature 
was the F-ratio of analysis of variance. 4 

A detailed list of the proposed features would be 
too lengthy to include here; instead, the general types 
of functions employed will be described. A complete 
description of the features is given elsewhere. 14 The 
most common function was an integration of the 
power in one or more filters over a number of time 
samples. To perform this integration, the log power 
values determined by the hardware were converted 
to a linear scale, summed, and then reconverted to 
log scale. This had the effect of simulating the same 
type of analyzer with broader filters and longer inte- 
gration intervals. Three “bandwidths” were chosen 
for integration: a single filter, a band of several filters, 
roughly approximating a single formant region, and 
the entire set of filters, corresponding to the power in 
the original signal during the 20-ms sampling interval. 
Three intervals of integration were also used: a short 
period of two to four time samples centered at a seg- 
mentation point, medium-length intervals extending 
from one segmentation point to the next, and long in- 
tervals encompassing several segmentation points. 
Most of the combinations of these integration regions 
were employed at each segmentation point. 

In order to detect finer differences between utter- 
ances, a section of each utterance was subjected to 
“time normalization.” The time-frequency matrix of 
filter values from the sample labeled “V* to that la- 
beled “L” was “stretched” or “shortened” by linearly 
interpolating the sampled output of each filter inte- 
grator to provide a fixed number of samples. Various 
integrals like those described above were determined 
during this time-normalized section also. 

Programs were written to estimate approximate 
formant frequencies and amplitudes as well. Formants 
are characterized by amplitude maxima in the fre- 
quency spectrum and are the result of the transfer 
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function of the vocal tract. 2 There is reason to believe 
that consistent differences exist among various speakers 
in absolute formant frequencies and detailed formant 
transitions from sound to sound, even though the ap- 
proximate motions are the same from talker to talker. 
These would reflect an interplay; between individual 
structural and behavioral differences. 

These various functions resulted in a total of 405 
proposed features. It was obvious by their design that 
they were not independent, neither functionally nor 
statistically, but no logical basis was available to select 
independent features that would be good, a priori , for 
speaker verification. The second step of reducing the 
feature set employed analysis of variance, ranking the 
405 features according to their F-ratio. This measures 
a quantity proportional to the variance of the speakers' 
means divided by the mean of each speaker's variance. 
Such a measure has the desirable properties of invari- 
ance to translation and scaling. No measure of feature 
dependent was calculated. The rank orders were tested 
for consistency across different speaker populations. 
The rankings were determined for two different groups 
of 25 speakers each and rank correlation coefficients 
were calculated. 12 It was determined that the F-ratio 
was a consistent measure of relative feature worth 
when computed over a set of 25 speakers. All of the 
experiments to be reported used the same feature set, 
the best 200 features being determined by a composite 
ranking based on 50 speakers. Details of the ranking 
are given elsewhere. 14 

Provision had to be made for features that sometimes 
did not exist or for which an estimate of value did not 
exist. For example, for certain portions of some ut- 
terances the system was unable to determine, ade- 
quately, pitch frequency or some formant frequency. 
This phenomenon will occur to some degree in all fea- 
ture-extraction systems. A missing feature value poses 
interesting theoretical problems in the design of a 
decision method. Should one estimate a value for it 
on the assumption that the feature really did exist but 
the system was not sophisticated enough to determine 
its value? Or should the feature really be presumed 
missing in the original signal and the utterance con- 
sidered in a special manner indicating that it is not 
like utterances in which the feature appeared to exist? 
Sebestyen 8 addressed these questions in relation to 
probablistic decision methods, but another approach 
seemed needed for the non-parametric Adaline tech- 
nique used here. One possible good approach would be 
to determine the relative frequency with which each 
feature was missing in both the real speaker training 
data and that of the training impostors. During recog- 
nition, a value would be substituted for each missing 


feature which favored neither the real nor the impostor 
class. Such a value could be the mean of the feature 
value averaged over both real and impostors. The fact 
that it was missing would be realized by changing the 
a priori probabilities of the two classes in accordance 
with the previously stored relative frequencies. Thus, 
if a real speaker consistently had a feature missing 
during training, and that same feature was missing 
during recognition, the recognition threshold would 
be shifted in favor of accepting the utterance as that 
of the real speaker. 

In the experiments reported here a simpler strategy 
was employed because of the relative infrequency of 
missing features. A mean value was retained for each 
feature that was ever missing from the real speaker's 
training data. During recognition, if one of these 
features was missing, the stored mean value was used 
as an estimate of the missing feature. If a feature was 
non-existent during recognition but always existed 
during training, the utterance was ignored entirely. 
Almost all utterances ignored in this way were impostor 
utterances and recognition performance would probably 
not be degraded significantly if each of these utterances 
was classified as being that of an impostor, but these 
statistics were not calculated. Approximately four 
percent of the recognition impostor set of utterances 
were ignored in this way . 

The adaptive linear decision algorithm described 
earlier was used for all experiments described here. 
Preliminary experiments were performed to determine 
a good value for K, the relative training threshold. It 
was determined that K = 5 provided a good trade- 
off since convergence was obtained in a reasonable 
amount of time and higher values of K significantly 
increased training time with little improvement in 
generalization performance. 

In order to perform training, the set of real utterances 
was stored in memory. The larger set of impostor data 
resided on direct access storage. The algorithm pro- 
ceeded through the data, obtaining utterances alter- 
nately from the real and impostor sets. When the end 
of either set was reached, selection began again at the 
first of the completed set but continued from wherever 
it happened to be in the other set. The method provided 
rapid convergence since the algorithm was always pre- 
sented with a member of the “other" class after 
adapting to the first class. A “pass" was defined to be 
one complete loop through the longer of the two lists 
of utterances — in our case, the impostor set. 

The training data consisted of approximately the 
first 50 utterances of the real speaker being tested and 
nine from each of the 29 impostors. These rather ar- 
bitrary numbers were the result of practical factors, 
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such as program running time, storage space, and the 
total number of speakers and utterances available. 
Further experiments have indicated that generalization 
results do not depend strongly on the exact amount of 
impostor training data unless one significantly reduces 
the number of impostors involved. 

The recognition data consisted of the remaining ut- 
terances from the real speaker (about 50) and all avail- 
able utternaces (20 or less) from each of 39 other im- 
postors, Thus, testing generalization of acceptance of 
the real speaker involved utterances produced by him 
after producing all of the Draining data, while gener- 
alization of impostor rejection was tested using entirely 
new people that the training algorithm had never 
processed. 

Computation time on an IBM Sytsem/360 Model 
40 was approximately one minute for both each training 
pass and recognition of 700 utterances. 

Table II lists the results of these experiments. The 
accuracy figure tabulated for each real speaker is the 


misclassification rate (impostor as real and real as 
impostor) for the case of the two classes being equally 
likely. In many applications the a priori probabilities 
would be unequal and the costs associated with the 
two types of errors would be different, thereby making 
a statement of a single misclassification probability 
uninformative. Ignoring the question of cost dif- 
ferences, the distribution of errors was usually such that 
unequal a priori probabilities should allow reduction, 
or at least no increase, in the probability of system 
misclassification (both types combined). Figure 3 
shows typical distributions of recognition dot products 
for two real speakers. The probability-density function 
of the dot product has been integrated from the left 
for the real speakers and from the right for the set 
of recpgnition impostors. The ordinate value corre- 
sponding to a particular abscissa value corresponds to 
the percentage error that would be experienced for 
that class (real or impostor) if the recognition threshold 
were placed at that value. 


TABLE II — Generalization error over fifty speaker real set (Crossover error rate) 


Speaker 

Passes* 

Error (%) 

Speaker 

Passes 

Error (%) 

1 

5 

.3 

26 

3 

.2 

2 

10 

.8 

27 

4 

5.1 

3 

7 

1.2 

28 

3 

1.4 

4 

6 

.0 

29 

2 

.0 

5 

5 

1.4 

30 

9 

.0 

6 

9 

.0 

31 

2 

.6 

7 

5 

.1 

32 

14 

.0 

8 

4 

.1 

33 

2 

.3 

9 

4 

.7, 

34 

2 

.0 

10 

8 

2.4) 

35 

5 

.0 

11 

2 

.2 

36 

6 

.4 

12 

6 

.2 

37 

9 

1.3 

13 

15** 

2.2 

38 

6 

.1 

14 

5 

.5 

39 

6 

.7 

15 

5 

.0 

40 

6 

.7 

16 

11 

1.2 

41 

8 

3.1 

17 

9 

3 A 

42 

4 

.0 

18 

15** 

2.3 

43 

4 

.3 

19 

11 

.2 

44 

4 

2.3 

20 

5 

.4 

45 

5 

1.2 

21 

5 

1.8 

46 

8 

.0 

22 

3 

.0 

47 

4 

1.8 

23 

12 

7.3 

48 

3 

.2 

24 

3 

2.1 

49 

2 

.7 

25 

4 

.3 

50 

4 

.1 


* Number of passes to reach convergence. 

** Convergence not reached by 15 passes, non-con verged gain used. 
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Cumulative % 



Figure 3 — Typical results of generalization tests 


The training algorithm was designed to produce an 
optimum recognition threshold of zero (positive-dot 
product corresponding to the real speaker, negative to 
impostors), but the resulting decision function was not 
symmetrical about the origin. Thus, the accuracy figures 
in Table II are based upon adjusting the recognition 
threshold to produce equal misclassification probabilities 
on recognition data. To obtain an intercept, the step- 
like nature of the cumulative real distribution was 
smoothed by linear interpolation. In practice, the 
recognition threshold must be set in some other way 
since independent data from the real speaker may not 
be immediately available. One method might be to 
set the threshold to produce a fixed rate of impostor 



Crossover Generalization Error Rate (%) 

Figure 4 — Generalization error histogram 



Figure 5 — Probable true distribution of accuracy across 
many real speakers 


acceptance (Type II error in statistical terms) and let 
the real rejection rate be undetermined until the real 
speaker uses the system a number of times. 

Figure 4 shows a histogram of the 50 accuracies in 
Table II. This distribution bears a resemblance to an 
exponential form, as might be expected. One would 
always expect a small percentage of people to have un- 
usually high error rates but no one can have negative 
error rates; hence, the skewed distribution. If a Suffi- 
cient number of recognition utterances was available 
from the real speakers to allow accurate estimation of 
very low error rates, the true distribution of error 
would probably look more like Figure 5. This fundamen- 
tally imperfect accuracy would result from the inevit- 
able variation in speech patterns with time and be- 
cause, in the limit of a large enough recognition impostor 
set, someone would probably be found who is similar to 
any given speaker, at least within the precision of the 
features being used. 

CONCLUSIONS AND FURTHER WORK 

A more general technique than the conventional 
Ad aline 7 approach has been treated in this paper. The 
upper and lower bounds of equation (11), applicable 
to the present method, have been derived from the 
general bounds of equation (10). These general bounds 
may be exploited for other applications of equation(2). 

In the conventional Adaline method, the iteration 
process guarantees a solution in a finite number of 
steps if a solution exists. In the approach adopted in 
this paper, the iteration process guarantees a solution 
in a finite number of steps if a solution exists and if 
this solution satisfies the condition of equation (11). 
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Since in either of the above two cases the solution 
vector is not known beforehand, the difference is only 
a philosophical one. The experiments reported in this 
paper, however, demonstrate that solution vector 
can indeed be found in most cases. 

The value of the large data base is pointed out 
again. First of all, this large data base is directed 
toward an adequate representation for the real and 
the impostor classes. Even after the database is divided 
to conduct independent design and te stexperiments, 
the above postulate remains largely valid. Also, in 
many phases of the speaker- verification work (e.g., 
feature selection), an iterative method is unavoidabla. 
Thus, once a tentative design is created on some date, 
the design is tested on a different set of data. If the 
design shows faults (large error rate), a new design 
is implemented by using both the former design and 
the former test data. This new design must now be 
tested on an entirely different set of data. This type of 
iterative procedure can only be realized if a large data 
base is available. 

It is felt that the accuracy obtained in the verifica- 
tion experiments is good and that enough people were 
involved in the test to produce meaningful results. 
The most comparable previously reported experimental 
results 1 state average accuracies of about ten percent 
with no provision for “No Decision.” Differences in 
data bases prohibit exact comparison of verification 
systems. The authors’ results cover a significantly 
larger base of reals than either of the previous experi- 
ments. 

The authors feel that much of the improvement in 
accuracy is the result of phrase selection and carefully 
designed segmentation algorithms but some of the 
improvement must be attributed to the rather idealized 
conditions under which utterances were gathered. 

However, the procedure was automatic once the 
segmentation program was designed. Further work is 
being pursued to determine the effect on current re- 
sults of degrading the signal in both bandwidth and 
signal-to-noise ratio. Female speakers will also be con- 
sidered. Improved results are most likely to be ob- 
tained through improving segmentation accuracy and 
flexibility, and the use of more sophisticated features 
(given better segmentation). It is felt that the present 
accuracy could be attained with fewer than 200 features 
by combining dependent features, if storage space 
presented a significant problem. 
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A hybird / digital software package for 
the solution of chemical kinetic 
parameter identification problems 


by ALAN M. CARLSON 

Electronic Associates , Inc. 
Princeton, New Jersey 


INTRODUCTION 

The modern hybrid computer offers many significant 
improvements over first generation hybrid systems 
These improvements include: 

1. The increased speed of digital computers en- 
abling programs to be written in hybrid FOR- 
TRAN without drastically limiting hybrid 
solution rates. 

2. The development of analog/hybrid software 
(e.g., hybrid simulation languages and analog 
set-up programs). 

The net result of these improvements has been an 
increase in the scope and complexity of hybrid appli- 
cations and a reduction in the effort required to program 
and debug hybrid problems. Unfortunately, the de- 
velopment of hybrid applications software has not 
kept pace with recent hybrid improvements. 

Applications software for purposes of this discussion 
is defined as an integrated set of digit al/hybrid pro- 
grams capable of solving the majority of frequently 
occurring problems in a specific applications area. 
Based on this definition, little or no tangible information 
is currently available on the practicality of developing 
hybrid software packages although its benefits are 
obvious. 

In mid-1968, EAFs Princeton Computation Center 
initiated a development project to determine the 
feasibility of hybrid applications software. The objec- 
tives of the project were to select a frequently occurring 


application area, develop general purpose software 
for it, and assess the resultant software based on the 
above definition, computer economics, ease of use, etc. 
The objectives of this paper are to present and illus- 
trate the use of the software package developed as a 
result of the above mentioned project. 

The chemical kinetic data analysis problem, which 
is often referred to as the chemical model building or 
parameter identification problem was selected as the 
applications area. Since the software package, which 
will be referred to as the kinetic data analysis or KDA 
package, solves chemical kinetic problems via either 
all-digital or hybird simulations; the question of simu- 
lation economics and accuracy was investigated and 
will also be discussed. 

The illustrative problem is the “Monsanto Bench- 
mark Problem” which has been well documented 2 ' 8 * 6 ” 8 
and typifies the chemical kinetic problems the KDA 
package was designed to solve. This problem requires 
the determination of twenty-two unknown parameters 
using thirteen sets of experimental data and a mathe- 
matical model requiring the simultaneous solution of 
seven non-linear differential equations. 

Problem analysis 

Referring to Figure 1 the kinetic data analysis 
problem, which occurs during the initial phases of, say, 
plant design and economic optimization projects, has 
three essential, related parts. They are: 
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Figure 1 — Typical kinetic data analysis flow diagram 


1. Performing kinetic experiments to obtain the 
data necessary to determine the model. 

2. Proposing one or more mathematical models 
representing alternative kinetic mechanisms, 
chemical reactions, etc. 

3. Computational analysis of the proposed models 
by determining values for model parameter 
(e.g., rate constants) that minimize the discrep- 
ancy between computed and experimental re- 
sults. 

The technology required to design and perform kinetic 
experiments is available and the initial derivation of 
mathematical models to simulate these experiments 
is not generally regarded as a diffiult task. However, 
the applications software required to evaluate these 
models is either unavailable, restrictive in a physical 
sense, or fails to provide the user with an efficient solu- 
tion to his problem. 

The project manager responsible for the solution of 
a kinetic data analysis problem, based on an impromptu 
survey, is not interested in becoming deeply involved in 
programming or underwirting extensive program de- 
velopment studies to solve his problem. With the excep- 
tion of a few industrial organizations, the computa- 
tional alternatives at his disposal are not consistent 
with his interests. The computational alternatives are: 

1. Direct Simulation — The classical analog com- 
puter or digital simulation language study 10 
where the analyst adjusts model parameters in 
a trial and error fashion. This technique is gener- 
ally successful; however, it is very time consum- 
ing, susceptible to human error, and inefficient 


except for small problems. The advantage of 
direct simulation is that it provides the analyst 
with a great deal of knowledge about the physical 
behavior of the system being simulated. 

2. Parameter Estimation — A variety of digital 
computer programs that solve kinetic problems 
using, for example, statistical techniques, line 
and non-linear least squares, etc. Specific illus- 
trations may be found in a recent article by 
Lapidus and Bard. 6 Unless the analyst is familiar 
with these programs and is capable of using 
them without making major modifications, their 
utilization creates a number of problems. These 
problems include: 

A. The mathematical techniques restrict the 
form of the data or the model, thereby 
influencing the design of kinetic experi- 
ments (e.g., batch-isothermal experi- 
ments) . 

B. The infrequent use of statistical tech- 
niques or lack of a working knowledge 
of statistics makes it difficult for the user 
to evaluate program results and equate 
them to the physical problem. 

Parameter estimation programs do, however, 
represent a relatively economical means of 
solving kinetic problems if they can be used 
efficiently and without major revisions. 

3. Parameter Optimization — This technique uses 
general purpose optimization algorithms (e.g., 
gradient search) to automate the above men- 
tioned direct simulation technique. Referring 
to Figure 2, the optimization variables, X, 
which are unknown parameters in the kinetic 
model, are varied so as to minimize an objective 
function. The objective function, F, is a scalar 
quantity representing the error between com- 
puted and experimental results which may be 
obtained using a variety of mathematical rela- 
tionships (e.g., sum of squares, integral of the 
absolute error, etc.). As shown in Figure 2, the 
best current values of the algorithm variables, 
\ B , are those model parameters resulting in. the 
“best fit” between experimental and computed 
concentration data, X F , when the algorithm can 
no longer improve the objective function. This 
technique is : 

A. Theoretically the most general purpose 
approach to solving kinetic data analysis 
problems. It may be used in either all- 
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Figure 2 — Simplified parameter optimization flow diagram 


digital or hybrid simulations and the 
mathematical forms of the kinetic models 
and physical systems that can be inves- 
tigated are not restricted. 

B. Not generally used because many organ- 
izations do not have access to appropriate 
software and the development of this 
software imposes an intolerable financial 
burden on any one project. In the past, 
this technique was not widely used due 
to high digital production costs. The 
“Parameter Optimization” technique, re- 
quires several hundred simulations of 
individual experiments per optimization 
run. 

The results of the above mentioned survey indicated 
a significant market existed for general purpose kinetic 
data analysis applications software if it could produce 
easily interpretable results, require minimal user partici- 
pation, and solve kinetic data analysis problems at a 
reasonable cost using the “Parameter Optimization” 
technique. These results were used as guidelines for 
the software development project. 

Software description 

The Kinetic Data Analysis package consists of 
several digital/hybrid processors whose individual 
functions and interactions are too complex to describe 
in this paper. However, referring to Figure 3, the cur- 
rent version of these processors may be visualized as 
five FORTRAN programs under the control of a Pro- 
gram Executive. The Program Executive restores and 
executes programs requested by the user, provides the 



Figure 3 — KDA program organization 


software package with a convenient mechanism to add 
programs, etc. 

The five programs shown in Figure 3 are an Analog 
Set-Up Program, a Data Preparation Processor, and 
three optimization programs. The optimization pro- 
grams are identical with the exception of the mathe- 
matical form and/or computer used to simulate the 
kinetic model or models. These programs, which have 
identical executive, optimization, and objective func- 
tion programs are: 

1. A hybrid optimization program using the analog 
computer to simulate kinetic models. 

2. An all-digital optimization program for kinetic 
models requiring the solution of one or more 
ordinary differential equations. 

3. An all-digital optimization program for kinetic 
models requiring the solution of a set of alge- 
braic equations (e.g., continuous stirred-tank 
reactor experiments.) 

The Analog Set-Up Program is an interactive pro- 
gram used, for example, to static check analog patch 
panels prior to executive hybrid production runs. Since 
programs of this type are generally part of the operating 
system software for a hybrid computer, a description 
of this program will not be presented in this paper. 
Subsequent discussions will also exclude the Program 
Executive, since its function has, for all practical pur- 
poses, already been defined. Therefore, the description 
of the Kinetic Data Analysis package will be limited 
to the Data Preparation Processor and the optimization 
programs. 

A brief description of how the user interacts and com- 
municates with the software package to solve a kinetics 
problem will be discussed first to clarify later discussions. 
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TOTAL NUMBER OF CHEMICAL SPECIES, . . . . G3 

UNKNOWN ARRHENIUS RATE CONSTANTS, . ....EH 

EXPERIMENTS OR SETS OF DATA, Ui3 1 

AND UNKNOWN MODEL PARAMETERS . . ca 

CATALYST VARIABLE TRANSFORMATION? El 

NON -ISOTHERMAL EXPERIMENTS? . . .0 

DIGITAL SOLUTION OF KINETIC MODEL? 0 


DATA SET TEMPERATURE DATA IN DEGREES LkJ 

MINIMUM DATA SET TEMPERATURE I 1 3 P 

MAXIMUM DATA SET TEMPERATURE I Z <6 

Figure 4— Typical KDA data form 



Figure 5 — Data preparation processor flowchart 


User interaction communication 

The user’s first contact with the Kinetic Data 
Analysis package is a set of data forms (see Figure 4) 
that request experimental data and other related infor- 
mation in kinetic rather than computer terminology. 
These forms are transformed into a deck of punched 
cards and fed to the Data Preparation Processor. Re- 
ferring to Figure 5, if no errors are detected, the data 
is processed and the results are printed out and stored 
on tape. This tape contains all optimization algorithm 
and kinetic information required for the execution of 
the optimization program. 

To complete the data forms the user is required to 
provide a “yes” or “no” answer to the question, “All- 
Digital Solution?” The initial answer to this question 



Figure 6 — Flowchart for first phase of KDA study 


is “yes” regardless of the user’s intention to perform 
a hybrid simulation because, referring to Figure 6, the 
all-digital optimization program has a built-in mechan- 
ism for obtaining : 

1. An analog static check and dynamic check solu- 
tion. 

2. A cost estimate of the all-digital solution versus 
the hybrid solution cost for problems where the 
most economic alternative is questionable. 

3. An accurate estimate for all unknown analog 
scale factors. 

4. An overall dynamic test for hybrid simulations 

which are required to program and debug the analog 
model for hybrid studies. 

For all-digital studies, the Kinetic Data Analysis 
package supplies three partially programmed FOR- 
TRAN IV subroutines and a “Block Data” subrou- 
tine for kinetic models consisting of either algebraic 
equations (e.g., stirred-tank reactor) or ordinary dif- 
ferential equations (e.g., batch or flow reactors). The 
integration package uses a fourth order Runge-Kutta 
integration algorithm and a readily implemented 
mechanism is available to obtain the classical “error 
versus step size” data to determine the correct and 
most economical step size for the integration process. 
The three subroutines require the user to : 

1. Store initial values of the variables being inte- 
grated in an integration initial condition array. 

2. Store computed results in a specified array. 

3. Compute intermediate variables and model 
derivatives or, for example, stage outputs using 
FORTRAN IV statements. 

Items one and two, typically, require two or .three 
statements and the requirements for item three are 
a function of the complexity of the kinetic model. 
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The “Block Data” subroutine is used to define total 
number of and names of intermediate and integration 
variables for control and printout purposes. 

These four programs (in object form) are incorporated 
into the Kinetic Data Analysis package to form an 
executable program which, upon request, will read in 
the data prepared by the Data Preparation Processor 
and print out the values of intermediate and dependent 
variables as a function of the independent variable. 
For all-digital studies, the user now has an executable 
optimization program capable of solving his problem. 

For hybrid studies, this program provides static 
check, dynamic check and scale factor information. If 
the user executes one digital solution to his problem 
(this will be clarified later), the results provide the 
information required to test the overall accuracy of a 
hybrid simulation and the running time of the all- 
digital model to compare hybrid versus digital eco- 
nomics. 

With the exception of reprocessing the card deck 
obtained from the data forms and requesting hybrid 
processing, no digital programming is required for 
hybrid studies. The Data Preparation Processor, in 
the hybrid mode, assigns hybrid interface channels to 
operate in conjunction with preprogrammed hybrid 
interface programs. Since data transferred to and from 
the analog model is done in a predefined sequence, 
the analog logic and interface circuits are also prede- 
fined and can be prepatched. Therefore, the additional 
effort required for hybrid studies is limited to the ana- 
log programming required to actually simulate the 
kinetic model. 

The Kinetic Data Analysis package has, in effect, 
organized the hybrid study and, with the aid of the 
static check, dynamic check, and scale factors deter- 
mined earlier, made programming and debugging the 
analog model a relatively simple task. The aforemen- 
tioned card input analog set-up program limits the 
time required to set up and check out analog programs 
to a few minutes. 

At execution time, the user communicates with the 
Data Preparation Processor, the optimization pro- 
grams, and the Program Executive through a set of 
predefined user oriented commands. These commands 
can be inputed via cards for batch-unattended runs or 
a console typewriter. Since the Kinetic Data Analysis 
package uses a “space” as a delimiter, commands are 
entered in “free format.” For example, the command 
“INPUT DATA 8”, which is used to read in the data 
tape from FORTRAN I/O unit 8, may start at 
any location on a punch card. 

The above mentioned command list, which contains 


more than fifty individual commands, is too extensive 
to discuss in detail. The commands can, however, be 
classified into the six areas of control they make avail- 
able to the user. 

1. Program Control.... Select I/O devices, call 

Kinetic Data Analysis Pro- 
grams, add to the program 
library, etc. 

2. Kinetic Data 

Handling Control I/O options and 

computations performed on 
experimental and computed 
kinetic data. 

3 . Optimization Data 

Handling... Control I/O options and 

computations associated 
with optimization vari- 
ables. 

4. Objective Function 

Control Control the mathematical 

form, weighting and the 
components or data sets 
used to compute the ob- 
jective function (see later 
discussion). 

5. Optimization Al- 
gorithm Control Select the mode (e.g., 

maximize, minimize) and 
other options (e.g., itera- 
tive, cyclic operation) as- 
sociated with the optimi- 
zation algorithm. 

6 . Model Control and 

Diagnostic Select hybrid diagnostic op- 

tions (e.g., scan for inter- 
face error messages) or 
digital model control op- 
tions (e.g., set or reset a 
one/zero model switch to 
modify kinetic model). 

The form of the results obtained by the user during 
program execution will be discussed later. 

Data capacity and classification 

The Kinetic Data Analysis package is capable of 
processing up to fifteen sets of experimental kinetic 
data (or data sets) which may contain concentration 
data for a maximum of fifteen chemical species or com- 
ponents. Each data set may contain up to ten values 
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of an independent or sampling variable (e.g., time for 
batch reactor, volume for flow reactor, etc.) and fifteen 
concentration points per sampling variable. These data 
must be common to all data sets and the sampling 
variable must be a monoatonic increasing function 
whose initial value is zero. However, equal sampling 
variable increments are not required. Each data set also 
contains provision for a catalyst concentration, a tem- 
perature, and an alphanumeric user identifier. The 
purpose and manipulation of the catalyst and tem- 
perature data will be discussed later. 

Up to fifteen unknown reaction rate constants, which 
are assumed to obey the Arrhenius equation, can be 
processed. This limit is independent of the thermal 
state of the system (i.e., isothermal or non-isothermal 
data sets). In addition, the Kinetic Data Analysis 
package can process up to fifteen unknown model or 
individual parameters (e.g., reaction orders, heat 
transfer coefficients, etc.). 

The above mentioned limits apply to all-digital 
studies and hybrid systems whose interface contains 
a minimum of sixteen analog to digital and digital to 
analog channels. 

The Data Preparation Processor catagorizes ex- 
perimental kinetic data into one of three classes called 
KDA Case Numbers. They are: 

Case #1.. One or more experiments performed 
under nonisothermal conditions 

Case #2.. Two or more experiments performed 
under isothermal conditions where 
the difference between the maximum 
and minimum temperature levels is 
greater than 5°C or °F. 

Case #3.. One or more experiments performed 
under isothermal conditions where 
the temperature range is less than or 
equal to 5°C or °F. 

This data catagorization is one of the key factors re- 
quired, for example, to organize optimization algorithm 
input data and the transfer of rate constants to the 
kinetic model. 

Optimization variable transformations 

Two tranformations, which play an important part 
in the data flow between the various KDA processors, 
are: 

1. The tranformation of rate constants and model 


parameters into optimization algorithm varia- 
bles. 

2. The transformation and transfer of these vari- 
ables to the kinetic model. 

Both transformations are a function of the aforemen- 
tioned KDA Case Number and the Arrhenius equation 

K = A -EXP ( — B/T) (1) 

where 

K = reaction rate constant 
A, B = Arrhenius coefficients 
T = absolute temperature 

The Kinetic Data Analysis package uses an alterna- 
tive, but rigorously correct, form of the Arrhenius 
equation whose derivation is shown in Appendix A. This 
relationship is 

K = (K«) (K HL f (2) 

where (3 is defined as 

(3 = (1/T* - 1/T)/(1/T l - 1/T„) (3) 

In equation 3, T H and T L are the maximum and 
minimum experimental data temperatures, respectively 
and T* is a mid-range reference temperature defined by 
the equation 

1/T* = (1/Tl + 1/Th)/2 (4) 

In equation 2, K R denotes the reaction rate constant at 
T R and Khl is the ratio of the maximum to minimum 
rate constants (Khl = K H /K L ). 

For experimental data catagorized as KDA Case # 3, 
the optimization variables, X*, are defined as: 

X, = K t (5) 

where 

i = rate constant index, i = 1,2,---, NRC 
NRC = the total number of rate constants. 

For the two remaining data catagories 

Mi-i = K^ 


( 6 ) 
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and 

X 21 = K hl (7) 

Individual or model parameters specified by the user 
are sequentially added after the last rate constant 
variable. For example, the first parameter, Pi, is 
assigned to \nrc+i for KDA Case #3. 

Referring to Figure 7, optimization variables are 
transferred to the kinetic model as a function of the 
KDA Case Number as shown in Table I. For hybrid 
kinetic models, the rate constants are scaled and trans- 
ferred to the analog computer in a predefined transfer 
sequence as shown in Table II. Note that for both 
digital and hybrid models concentration initial con- 
ditions, sampling points, and a ramp sloape (i.e., 
reciprocal of the last data set sampling point) are also 
transferred to the kinetic model. 



Figure 7 — Simplified objective function flow diagram 


TABLE I — Items influenced by KDA case number 



KDA CASE NUMBER | 


1 

2 

_ 3 _ 

TOTAL NUMBER OF 
OPTIMIZATION 
VARIABLES 

2 • NRC + NPR 

2 » NRC + NPR 

NRC + NPR 

FORM ASSIGNED TO 
OPTIMIZATION 
VARIABLES 
REPRESENTING 

RATE CONSTANTS 

k r 

k hl 

k r 

k hl 

K 

RATE CONSTANT DATA 
TRANSFERRED TO 

DIGITAL MODEL 

A, B 

K R' k hl 

A, B 

K 

K 

RATE CONSTANT DATA 
TRANSFERRED TO 

HYBRID MODEL 

LOG (K hl ) 

LOG (K r ) 

K 

K 


TABLE II — Typical transfer sequence* for KDA case 

#2 



* Channel zero used by prepatched KDA circuits. 
** Transferred when applicable. 


Optimization algorithm and objective function 
options 

The current version of the Kinetic Data Analysis 
package uses a slightly modified version of the PAR- 
TAN algorithm described in detail by Harkins 4 . Since 
a detailed description of the algorithm is available, 
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this paper will only consider the mathematical form 
of the objective function. However, it should be noted 
that this algorithm, which can be classified as an “ac- 
celerated gradient” algorithm, was selected because of 
its proven effectiveness on a number of all-digital and 
hybrid kinetic studies performed in recent years at 
EAI Computation Centers. The add-on capability of 
the software package makes it possible to add other 
algorithms if the need exists. 

The mathematical form of the objective function 
is specified by the user at execution time. Referring 
to Figure 7, the objective function is based on the 
“total error” or sum of the individual data set errors. 
For example, to compute the objective function for 
a problem consisting of ten components and ten data 
sets, ten analog runs or one hundred digital integrations 
are required. 

The form of the objective function, its weighting 
factors, the exclusion of a chemical species or data sets 
from the objective function, etc., are defined by the user 
at execution time via the Executive Program. The 
software package provides integral and polynomial 
objective function options to the user based on the fol- 
lowing definitions: 


In the above relationship 0 denotes a positive sampling 
variable ratio whose maximum value is unity: 


e 


m __ 
8 


sv m 

sv M 


( 10 ) 


The weighting factor is unity if PW1 and PW2 are 
zero. If PW1 = 1.0 and PW2 = 0, initial values are 
weighted, and if PW2 = 1.0 and PW1 = 0, final 
values are weighted. Note that both PW1 and PW2 
cannot simultaneously be set to one. 

The integral option defines individual data set 
objective functions as 


FRUN n 


z 


sv* 

WGT Bl< [ Eft t m , i d(SV) 

Jo 


( 11 ) 


where the integral is computed using a “Trapezoidal 
Rule” approximation and the weighting factor 
(WGTn.i) is defined as 

WGT n ,i = 1 + CWl-C*.< + CW2-(l-Cn.i) (12) 


E n , m>i = COMP t -|C n , Wl * - C' n , mfi \ EXPN 
COMPt = 1.0 or 0 when a chemical species is to 
be excluded 

OMITn = 1.0 or 0 when a data set is to be 
excluded 

i — index denoting a chemical species, 

1 < i < J 

m = index denoting a sampling point, 
1 < m < M 

n = index denoting a data set or experi- 
ment, 1 < n < N 

C nt m ti = computed results (unsealed) array 
C* ,m , t = experimental results array 

F = total objective function 
FRUN* = data set objective functions 
EXPN = a positive, non -zero constant 

The polynomial option defines individual data set 
objective function as 

FRUN„ = Z WGTft,™ Z K, m ,i (8) 


where the weighting factor (WGT„, m ) is 

WGT„. m = 1 + + PW 2 • e m , (9) 

Qin a ' 


The control constants CW1 and CW2 are identical in 
behavior to PWl and PW2. the C n ,i values are con- 
centration weighting factors computed from experi- 
mental data by the Data Preparation Processor. 

If CWl = 1.0 and CW2 = 0, large concentrations 
are weighted, and if CW2 =1.0 and CWl = 0, small 
concentrations are weighted. This weighting factor 
is useful when, for example, a component whose range 
is 0 — 0.05 in a given experiment is more sensitive to 
an analytical error of, say, =t 0.01 than a component 
whose range is 0.5 — 1.0. 

Referring to Figure 7, the total objective function 
F, is. obtained by summing the individual data set 
errors, FRUN, modified by OMIT* (1.0 or 0) to control 
the inclusion or exclusion of the various data sets. 

F = Z OMIT„ • FRUN N (13) 

to — 1 

Note that user commands control the values assigned to 
OMITn, COMP*, EXPN, PWl, PW2, CWl and CW2. 

In addition to the aforementioned objective func- 
tions, the software package has provision for the user 
to add a digital subroutine to compute the individual 
data set errors if the “built-in” options are not appli- 
cable. For example, if the data set errors are computed 
on the analog computer this subroutine can be used 
to transfer them into the digital computer. 
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Optimization results include a table containing the 
objective function, its fractional contribution to the 
total objective function, and the average error per 
data point for each data set. The total absolute error* 
or standard error is included in all results to allow the 
user to compare the relative merits of various objectives 
functions since their magnitudes depend on their 
mathematical form. 

Temperature and catalyst data 

Each of the data sets has associated with it a single 
temperature which is sufficient for experiments per- 
formed under isothermal conditions (i.e., KDA Case 
#2 and 3.) For non-isothermal situations the data 
set temperature is the initial or feed temperature; 
therefore, the requirements of kinetic models which 
include energy balances (i.e., temperature obtained 
from the solution of a differential equation) are also 
satisfied. 

Studies that require the storage of, say, temperature 
versus time data are simulated by : 

1. Using “Data” statements to include these data 
in the subroutines supplied by the user for all 
digital studies. 

2. Using, say, card programmed diode function 
generators (CPDFG) on the analog computer 
for hybrid studies. 

The CPDFGs work in conjunction with prepro- 
grammed logic that automatically associates each 
function with the appropriate data set during the 
simulation. 

The software package also allows the user to asso- 
ciate a catalyst concentration with each data set. The 
catalyst concentration, which is transferred to the 
kinetic model, provides the user with a mechanism for 
simulating kinetic models involving a non-reactive or 
reactive catalyst. For example, when catalyst concen- 
tration data is not available in studies involving re- 
active catalysts, the catalyst concentration is the initial 
condition for the catalyst material balance equation. 

Typical application 

The following discussion will be devoted to the 
solution of “Monsanto Benchmark Problem” using 
the Kinetic Data Anaylsis package on a fu ly expanded 
EAI 8900 Hybrid Computer. This discussion will in- 
clude a mathematical description of the problem, il- 
lustrate the form of the results obtained during the 

* Equation 8 with EXPN and WGT„, m equal to unity. 


preparation and optimization phases of the study, 
and summarize the numerical results obtained from 
the study. Simulation accuracy, errors in results, and 
economics will also be discussed. 

Problem description 

The illustrative problem contains the two essentia, 
ingredients to perform a kinetic data analysis study; 
a proposed kinetic model and experimental data. Re- 
ferring to Table III, each of the thirteen available 
data sets contained concentration-time data for seven 
chemical species (i.e., R. S, T, U, W, X, and Y), the 
concentration of a non-reactive catalyst, and a tem- 
perature. These data were obtained from experiments 
performed under isothermal conditions over a 133 to 
181°C temperature range which included a threefold 
variation in catalyst concentration, 117 to 368. No 
two data sets had identical initial concentrations and 
the number of non-zero sampling variable (i.e., time) 
points per data set varied from one to four. 

The proposed kinetic model, which is shown in Table 
IV, is based on the following chemical equations: 

Ki K 4 

r + suz; t — >u 
k 8 

k 2 

R + S >U 

Ks 

R + S^W 
K 9 

k 6 k 6 

T + S >X ±Z=7U + S 

Kio 

K 7 

U + S UIll Y 
K n 

The model contained eleven unknown rate constants 
(Ki — Kn) and since this study falls under the KDA 
Case #2 category, there are a total of twenty-two 
optimization variables. Each rate constant has one K* 
and one K HL optimization variable associated with it. 

Data preparation processor results 

Processing the card deck corresponding to the KDA 
Data Forms produced the results indicated in Figure 
5, which are illustrated by Figures 8 through 11. These 
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TABLE III — Typical data set 
IDENTIFIER: RUN TWO 


TEMPERATURE: 146°C CATALYST CONCENTRATION: 117 


TIME 

HOURS 

CONCENTRATION IN A 

MSS FRACTION | 

R 

5 

T 

u 

w 

X 

Y 

0.0 

0.425 

0.501 

0.018 

0.005 

0.050 

— 

— 

1.0 

0.359 

0.465 

0.051 

0.017 

0.106 

— 

0.002 

2.0 

0.315 

0.442 

0.086 

0.033 

0.120 

— 

0.004 

3.0 

0.281 

0.424 

0.123 

0.048 

0.116 

— 

0.008 


AOC ASSIGNMENTS 


ADC CHANNEL 
NUMBER 

VARIABLE 

NAME 

maximum 

VALUE 

scale 

FACTOR 

1 

CQMR ft 

0 . If 0BE 01 

0.1000E 01 

2 

C0MR 1| 

0.1B0BE 01 

0.1000E 01 

3 

comp r 

0.100BE 01 

0.1000E 01 

4 

comp y 

0, lfffF 01 

0.1000F 01 

9 

COMP M 

0 , 9000 E 00 

0.2000E 01 

6 

COMP X 

0.9000E 00 

0.2000E 01 

7 

comp r 

0.1000E 00 

0.1000E 02 


OAC ASSIGNMENTS 


oac channel variable maximum scale log <khlmx> 

NUMBER NAME VALUE FACTOR (CASE 1 ONLV> 


1 

RATE 

CON 

1 

0.1900E 

02 

f ,6*«6E-fl 


RATE 

CON 

2 

0.0000E 

01 

0.1290E ft 

3 

RATE 

CON 

3 

0.0000E 

01 

0.1290E 00 

4 

RATE 

CON 

4 

0.4000E 

00 

0,29006 01 

9 

RATE 

CON 

9 

0.1000E 

01 

0.1000E 01 

6 

RATE 

CON 

6 

0.100IE 

01 

0.1000E 01 

7 

RATE 

CON 

7 

0.2500E 

00 

0 ,40006 01 

9 

RATE 

CON 

6 

0.0000E 

01 

0.1290E 00 

9 

RATE 

CON 

9 

0.5000E 

02 

0.2000E-01 

10 

RATE 

CON 

10 

0 , 1000 E 

01 

0,10006 01 

11 

RATE 

CON 

11 

0.4000E 

01 

0.2900E 00 


Figure 8 — Hybrid interface assignments 


figures omit the first phase of the form processing out- 
put. That is, the direct playback of the KDA Data 
Forms with appropriate error messages when errors 
are detected. 


TEMPERATURE IN DEC C MINIMUM 130.0 MAXIMUM 200.0 

CATALYST ( UNKNOWN ) IN UNKNOWN MINIMUM 0.U7E 03 


SCALED CATALYST-TEMPERATURE DATA 


DATA SET 

NO. IDENTIFIER 

SCALED 

TEMPERATURE 

scaled 

catalyst CONC 

1 

RUN 

ONE 

0.7300 

0.3315 

2 

RUN 

TWO 

0.7300 

0.3179 

3 

RUN 

3 

0.9000 

0.6440 

4 

RUN 

FOUR 

0.8100 

0.3288 

5 

RUN 

FIVE 

0.6100 

0,6576 

6 

run 

SIX 

0.7900 

0.6522 

7 

RUN 

7 

0.7400 

0.6522 

8 

RUN 

8 

0.8350 

0.6522 

9 

RUN 

NINE 

0,8500 

0.6522 

10 

RUN 

TEN 

0.6900 

0.6522 

11 

RUN 

11 

0.6650 

0.6739 

12 

RUN 

12 

0.9050 

0.6141 

13 

RUN 

13 

0.8650 

1.0000 

CAT 

CONC 

AND TEMP 

XFER ON DAC 14 

AN0 15 DURING * B ' PERI00 


Figure 9 — Temperature-catalyst interface data transfer 


Figure 8 illustrates the hybrid interface assignments 
for the eleven reaction rate constants and the seven 
chemical species involved in the mathematical model, 
their maximum values, and their scale factors (i.e., 
reciprocal of maximum value). Figure 9 details the 
scaled temperatures and catalyst concentrations that 
will be transferred to the analog model during the 
“B” demultiplexing period on D/A-channels 14 and 
15. Note that this problem is in the KDA Case #2 
category whose interface transfer sequence has been 
illustrated in Table II. 

Referring to Figure 10, the Data Preparation Proc- 
essor assigns a number to both the data sets and 
chemical species involved in the study. These numbers 
are required by the user to execute commands that 
manipulate specific chemical components or data sets. 
For example, to exclude the eleventh data set from the 
study, the command is “EXCLUDE 11” not “EX- 
CLUDE RUN 11” where “RUN 11” is the data set 
identifier specified by the user. 

The lower half of Figure 10 illustrates a typical data 
set printout containing the original “time” units and 
sc. led values (i.e., normalized, of the sampling variable. 
The normalized values were obtained by: 
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NAME / KOA NUMBER SUMMARY 


KOA NUMBER 

COMP 

NAME 

1 

COMP 

R 

2 

COMP 

S 

3 

COMP 

T 

4 

COMP 

U 

5 

COMP 

u 

6 

COMP 

x 

7 

COMP 

V 


KOA NUMBER 

DATA SET 

IDENT 

1 

RUN 

ONE 

2 

RUN 

TWO 

3 

RUN 

3 


RUN 1 

FOUR 

5 

RUN 

FIVE 

6 

RUN 

SIX 

7 

RUN 

7 

6 

RUN 

S 

9 

RUN 

NINE 

10 

RUN 

TEN 

11 

RUN 

11 

12 

RUN 

12 

13 

RUN 

13 


DATA SET NUMBER 2 USER IDENTITIES RUN TWO TEMPERATURE 146.0 C 

CATALYST CONCENTRATION 0,11706 03 UNKNOWN CATALYST RATIO 0.1B00E 01 

BETA FACTOR -0.2420 


Sampling points 


TIME 

0.000E 00 

0.100E 01 

0 . 209 E 01 

0.3006 01 

normalized 

0.0000 

0.3333 

0.6667 

0.9999 

COMPONENT 




CONCENTRATION IN WGT FRAC 

COMP R 

0.425E 00 

0.3596 00 

0.315E 00 

0.2016 00 

COMP S 

0.501E *0 

0.465E 00 

0.442E 00 

0.4246 *0 

COMP T 

0.100E-01 

0 . 51 06-0*1 

0.8606-01 

0.123S 00 

COMP U 

0 , 990E-02 

0.170E-01 

0 . 3306-01 

0.4006-01 

COMP H 

0 .5006-61 

0.1066 00 

0.120E 00 

0.116E 00 

COMP X 

0,0006 00 

0.000E 00 

0.000E 00 

0.000E 00 

COMP Y 

0.000C 00 

6.2006-02 

0 , 4006-02 

0 . #006-02 

CONC SUM 

0.999E 00 

* 1006 01 

0.100E 01 

0.100E 01 


Figure 10— 4KDA number assignments and 
processed data set 


TABLE IV — Mathematical model 


DEFINITION OF TERMS 


R ] =K ] RS 

R 6 =K 6 US 

R „-K„Y 

CN 

II 

r 7 = K ? us 

R i2 =r i + R : 

R 3* K 3 RS 

1— 

CO 

v 

ii 

CO 

O' 

R 13 =R 4 + R l 

R 4 =K 4 T 

R 9 =k 9 w 

R,4=R 6 + R. 

V K 5 TS 

W 


t = time 

CAT = Catalyst Concentration 



a =CAT/(CAT)^| N = Catalyst Ratio 

MRT, MSR, MST, etc. = Molecular Weight Ratios; 0 = a I 1 

MATERIAL BALANCE EQUATIONS 

dR 35 dR = -R 10 + (MRT) R q + (MRW) R 0 
d5 adt 12 8 9 

dS = dS = -(MSR) R 2 - R u - R 5 + (MST) R g + (MSW) R ? + (MSX) K }Q + (MSV) R^ 
d0 adt ” ! 

dT = dT = (MTR)R. - R,« - (MTS) R, ; dY = dY - (MYS) R - R 

d© ocdt 1 13 5 d0 adt " 

dU = dU = (MUR) R 0 + (MUT)R. + (MUX) R. n + (MUY) R. . - (MUS) R. 

d© adt 4 U 1 1 4 

dX = dX = (MXS) (R, + R,) - R nn ; dW - dW - (MWR) R_ - R 

d© aTdt 5 6 10 d© ^ 


OATi SET 
NO. IDENTIFIER 


CMEmICAL SPECIES 
1-5 / 6-10 / 11-15 


1 Run one 0.5215 0.2410 0.0630 0.0335 0.1065 

0.0000 0.0150 


2 Run Two 0.3450 0.4560 0.0695 0.0256 0.0900 

0.0000 0.0035 


3 RyN 3 0.1550 0.3100 0.1023 0.2400 0.0197 

0.1553 0.0160 


PARTAN DATA SUMMARY 


C0.T.01 0»T».... T.Pf . ».* N.0, . 1 ««> • J >■**«• • ”” 

MIH wr. * , ' 1,M 


ALM1N 1 6.061* 


PARAMETER DATA 

NO. NAME TYPE RANGE 


VALUE 


MAXVAL MINVAL 


KR01 
HL01 
KR02 
ML 02 
KR03 


0,10006 01 
0.1990E 03 
0.4000E 0C 
0 , 3990E 03 
0, 18006 01 


0,43006 00 
0,10406 03 
0,14006 f0 
0 . 10606 03 
0.7100E 00 


0,10006 01 
0 . 20006 03 
0.4000E 00 
0,40006 03 
0.1000E 01 


0 ,00006 60 
0 .100*6 01 
0.0000E *0 
0,10006 01 
0 ,00006 00 


Figure 11 — Concentration weighting factors and 
algorithm input data 


1. Performing the catalyst transformation shown 
in Table IV, which was the result of a “yes” 
answer to the question, “CATALYITC RE- 
ACTIONS?” (see Figure 4). 

2. Dividing all values by the maximum sampling 
point to form the “normalized” values or scaled 
sampling points. 

These results also contain concentration and rate 
summations for each time point to assist the user in 
evaluating the consistency of the data based on ma- 
terial balance. The rates, which are not shown in 
Figure 10, were computed numerically by differentiating 
a polynomial whose coefficients are determined by a 
least square fit of the concentration data. 

Figure 11 illustrates the concentration weighting 
factors and the input data to the PARTAN Algorithm. 
Note that the Data Preparation Processor has assigned 
names, for example, “KR01”, to the optimization 
variable and placed them in a “type three” category. 
This means they are constrained between an upper and 
lower limit denoted by “MAXVAL” and “MINVAL”. 
The initial values of the variables are in the “VALUE” 
column. 

The results of the preprocessing indicated that the 
eleventh data set should be excluded from the study 
because its concentration sums indicated as much as 
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ten percent error. Therefore, optimization results were 
obtained using twelve, rather than thirteen, data sets. 

Optimization results 

Figures 12 through 15 illustrate the form of some of 
the results obtained from the hybrid solution of the 
problem. Figure 12 illustrates the user commands, 
which are documented as they are processed, and an 
optimization summary. The summary is updated every- 
time the algorithm detects an improvement in' the 


TYPE 

input OATA 8 


KDA OATA TAPE IDENTIFIER 


USER IDENTIFIER 

company 

MONSANTO COMPANY 

location 

ST. LOUIS, MO. 

pro j engr 

PAUL PARISOT 

EA1 IDENTIFIER 

PRO J NUMBER. . . . 

100609 

PRO J ENGR 

A, CARLSON 

current date . . . 

SEPT.. 1968 

TAPE UNIT 9 

RESTORE PARTAN PLOT 

exclude u 

Integral objective function 
height large concentrations 

error exponent 1.0 



DATA 

NUMBER 

SET 

IDENTIFIER 

ABSOLUTE 

ERROR 

error 

FRACTION 

AVERAGE 

error 

1 

RUN 

ONE 

0.7910E-01 

0.20936-01 

0,11306-01 

2 

RUN 

TWO 

ft . 2296E 00 

0 . 5961E-01 

0 , 10936-01 

3 

RUN 

3 

B.4674E 00 

0.1213E 00 

0.33396-01 

* 

RUN 

FOUR 

0.4095E 00 

0.1270E 00 

0.2331E-01 

5 

RUN 

FIVE 

0.4052E 00 

0.1051E 00 

0.1929E-01 

6 

RUN 

SIX 

0.1987E 00 

0.5150E-01 

0. 9463E-02 

7 

RUN 

7 

0.4990E 00 

0.1295E 00 

0.2J76E-01 

ft 

RUN 

ft 

0.5330E 00 

0.1303E 00 

0. 1 '>036-01 

9 

RUN 

NINE 

0.1603E 00 

0.4162E-01 

0 .57276-02 

10 

RUN 

TEN 

0.1424E 00 

0.3697E-01 

0.6703E-02 

1? 

RUN 

12 

0.3960E 00 

0.1027E 00 

0.14146-01 

13 

RUN 

13 

0.2517E 00 

0 . 05396-01 

0.tt99?E-02 


PERCENT R0PRQ ERROR 0.650 


DATA SET NUMBER 2 USER IDENTIFIER RUN Two 

CATALY5T CONCENTRATION 0,11706 03 UNKNOWN CATALYST 

beta factor - 0.2420 

SAMPLING POINTS 

TEMPERATURE 146.0 C 

RATIO 0.1000E 01 

TIME 

0.000E 00 0.100E 01 0.200E 01 0.300E 01 


COMPONENT 

concentration 

IN WGT FRAC 

COMP H 

0.425E R0 0.399E 00 0.315E 00 0.201E 00 



0.425E 00 0.3456 00 0.3026 00 0.2796 00 



0.000E 00-0.137E-01-0 . 1216-01-0.1706-02 


COMP S 

0.501E 00 0.465E 00 0 . 4426 00 0.424E 00 



0.501E 00 0.455E 00 0.4316 00 0.4166 00 



0.000E 00-0 ,9066 -02-0 .1016-01-0 . 7906-02 



Figure 13 — -Typical objective function summary 
and detailed data set results 


OPTIMIZATION SUMMARY 


I kPHOVEmF'.T 

number 


OBJECTIVE NO OF FUMC. 

FUNCTION EVALUATIONS 


NO Of GB AO , CURRENT CURRENT 
EVALUATIONS ALPHA beta^ 


li Z . 889 FF 01 

1 0.6715E 01 

2 7I.5308F PI 

3 P.5221E 01 

4 0.5106F. 01 

5 Zl.5l00f PI 

6 0.5021E PI 

7 0.A017F 01 

6 0 . 4fl 1 4E 01 

9 0.4769E 01 

10 0.4766E 01 

U 0.4711E 01 

12 0.4694E 01 

13 0.4667E 01 

14 0.4498E 01 

15 0.4479E 01 

16 0.4424E 01 

17 0.4415E PI 

18 0 , 4 380f 01 

19 P.4235E 01 

20 P.4137E 01 

21 0.4111E 01 

22 0 . 41 1 IE 01 

23 0.4104E PI 

24 0.4093E 01 

25 P.4069E 01 

26 P.3963E 01 

27 0.3918E 01 

26 P.3839E 01 

29 0.3832E 01 

30 0.3827E 01 


24 

25 

26 

29 

30 
54 

56 

57 
59 
62 
85 
87 


lift 

124 

151 

154 

155 
182 
183 
188 
191 
192 ' 
216 
272 
296 
299 
326 

329 

330 


10 

IP 

11 

U 

11 


0.1000 

0 .1000 

0.1000 

0 . 1000 

0.1000 

0.1618 

0.1000 

0.2618 

0.1000 

0.4236 

a, 1«00 

0,4236 

71,1000 

0.1618 

0.1000 

0.1618 

0.1000 

0.1618 

0.1000 

0,1618 

0.1000 

0.1618 

0.1000 

0.1618 

0.1000 

0.2618 

0.1000 

0.2618 

0.1000 

0.1000 

0.1000 

P , 1000 

0.1000 

0,0382 

0 ,1000 

0,0362 

0.1030 

0.0382 

0.0382 

0.0382 

0.0362 

0,0618 

0.0382 

0.1000 

0,0362 

0,1000 

0.0382 

0,1000 

0.1000 

0.0362 

0.1000 

0,0382 

0.0382 

0,0382 

0.0382 

0,0618 

0,1000 

0.1000 

0.1000 

0.1618 

0.1000 

0.1618 


12. f 

13.2 

14.2 

15. 2 


22.2 

21.2 

22.2 

23.2 

24.2 
23.0 

24.2 

27.2 

22.2 
22.2 
32.2 


7 V4L.UC* 

2.HI2C 2| 
2.2TK 21 

2.1322 21 

2.1322 21 
2.1H2E 21 
MIN 21 
2.11122 21 

2.41111 21 

2.41112 #1 
2.4V42 21 
1.41'M 21 
I.47II 21 

2 . 4422 21 
2.44*2 II 

2.4422 It 
1.4471 21 

2.4422 II 
1.4411 01 
2.4321 II 
1.4232 11 
1.4133 II 
0.4111 01 
1.4112 01 
1.4112 II 
1.4122 II 
2.4222 21 
1.32*2 *1 
*.3*12 21 
1.3242 21 
2.3232 *1 
1.321! 21 


Figure 14 — Typical line printer objective function — 

No. of improvements plot 

objective function during the execution of an optimiza- 
tion run. It keeps a running record of the total number 
of gradient evaluations and objective function evalu- 
ations and notes any optimization variables at their 
upper or lower limit (not shown in Figure 12). The 


Figure 12 — Typical executive program output and 
optimization summary 
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Figure 15 — Typical concentration results output 
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alpha and beta values pertain to the algorithm pertur- 
bations, etc. 

During the optimization process the optimization 
summary is the only output available to the user with 
the exception of a percent improvement indicated on 
the analog computer digital voltmeter. The percent 
improvement is relative to the initial or base value of 
the objective function. 

After the optimization process has been completed, 
the previously mentioned objective function summary 
is obtained (see Figure 13) which includes a reproduci- 
bility error. Referring to Figures 13 and 14, the user 
may also request a detailed comparison of experimental 
to computed results and a line printer plot of the ob- 
jective function or any of the optimization variables 
as a function of the number of improvements. 

The objective function summary allows the user 
to determine if, for example, any one data set is making 
an excessively large contribution to the objective 
function. The reproducilibity factor, which is typically 
zero for all-digital studies, is obtained by re-evaluating 
the objective function under “best fit” conditions after 
the optimization process has been completed. The per- 
cent error between the two objective functions is the 
percent reproducibility error shown in Figure 13. It 
reflects the total error introduced into the objective 
function by the hybrid interface, analog components, 
etc. As shown in Figure 13, this error wa< typically 
less than one percent . 

The objective function plots allow the user to 
graphically follow the path of the optimization process. 
However, plots of specific optimization variables 
versus the number of improvements are more impor- 
tant. They indicate the activity or sensitivity of 
variables during optimization and allow the user to 
take appropriate action if, for example, a variable 
always remained essentially constant. 

Figure 15 shows a concise final results plot that can 
be requested via the appropriate user command. This 
plot, which is obtained on the analog strip chart re- 
corder, consists of a sample variable ramp ( 6 ) and 
a set of curves for the computed concentrations. The 
“blips” on the concentration curves represent the 
deviation between the curves and experimental data 
points; therefore, the absence of “blips” represents a 
near perfect or perfect fit. The pulse prior to each ramp 
denotes the data set number. The first data set is 
preceded by a 10 volt pulse, the second by a 20 volt 
pulse, etc. 

Problem solution and results 

To avoid the possiblity of confusing a local minimum 


on the error surface with the true minimum, sets of 
optimization runs were always made starting from four 
points on the error surface. The four sets of starting 
values used were the maximum and minimum values 
of the optimization variables, their arithmetic average 
values, and the initial or “best guess” values. The 
problem was solved using the following iterative process : 

1. Perform four separate, complete optimization 
runs using the maximum, minimum, average, 
and initial values of the optimization variables. 

2. Examine the results and determine if the final 
values of the objective function and optimiza- 
tion variables show good agreement. 

3. If the results of step two indicate more runs 
are required, refine the four sets of starting values 
based on their results and repeat the first step. 

This iteration process was repeated three times using 
the integral form of the objective function with large 
concentration weighting and an error exponent equal 
to unity. Referring to Table V, the values of the ob- 
jective function for these three iterations are reported 
in standard error form (i.e., the unweighted sum of 
the absolute concentration errors). After the third itera- 
tion, the mathematical form of the objective function 
was changed to the standard form to eliminate the 
effects of the concentration weighting and the results 
of this iteration indicated that for all practical pur- 
poses, the “best fit” had been obtained. 

The four sets of optimization variables obtained 
from the fourth iteration showed reasonably good but 
not perfect agreement. The error introduced into 
specific reaction rate constants by differences in the 
final values of the optimization variables were com- 
puted using the error form of equation 2 ; namely, 



where AK^ and AK ul are the most probable €>rrors and 
K b and K hl are the average values of the individual 
optimization variables. 9 The results of this analysis are 
shown in Table VI. Note that the absolute percent error 
of any one rate constant is a function of temperature or 
/3 whose range is ±0.5. 

Simulation Accuracy 

Comparisons between equivalent hybrid and all- 
digital optimization runs were made to determine how 
analog component or digital integration errors in- 
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TABLE V — Objective function results 



STARTING LOCATION 

INITIAL 

MAXIMUM 

MINIMUM 

AVERAGE 

STARTING VALUES* 

8.28 

5 ‘.76 

25.2 

5.65 

ITERATION 1* 

3.76 

3.15 

3.59 

3,65 

ITERATION 2* 

3.15 

3.08 

3.12 

3,08 

ITERATION 3* 

3.06 

3.03 

3.11 

3.08 

ITERATION 4* 

3.00 

2.98 

2.99 

3.01 


"Standard error equivalent of weighted integral objective function. 


TABLE YI — Rate constant error analysis results 


i 

ABSOLUTE PERCENT ERROR 3 

4 

MINIMUM ERROR 

K * 

*HL 

K i 

MAXIMUM ERROR 

1 

0.32 

3.48 

2.06 

2 

0.44 

0.83 

0.86 

3 

0.26 

5.43 

2.98 

4 

0.30 

4.27 

2.43 

5 

1.06 

1 .96 

2.04 

6 

1.60 

4 .48 

3.84 

7 

2.05 

1.98 

3.04 

8 

0.59 

3.18 

2.18 

9 

0.23 

3.49 

1 .97 

10 

3.22 

5.14 

5.79 

11 

2.46 

4.33 

4.62 


fluenced results. This comparison was based on the 
standard objective function value obtained after one 
function evaluation. Using both single and double pre- 
cision digital integration, a comparison of objective 
function values showed good agreement between the 
digital and hybrid results. Both the hybrid and single 
precision digital integration results were within approxi- 
mately =b 1% of the results obtained using double pre- 
cision integration. These minor differences were traced 
to errors of less than 0.001 in computed concentration 
data points. 

One comparison o f equivalent all-digital versus hy- 
brid optimization runs was made. Although both 
solutions differed slightly when their optimization 



Figure 16 — Typical hybrid-digital economic plot 


summaries (see Figure 12) were compared, the final 
objective funct.on and optimization variable results 
obtained were identical for all practical purpose (i.e., 
one or two percent difference). This would seem to 
indicate that the errors associated with experi- 
mental data and the mathematical model will 
have a greater influence on results than the relatively 
minor errors introduced by digital integration or ana- 
log components. It was also concluded that double 
prec'sion integration accuracy was not worth the ad- 
ditional computation time it required compared to 
single precision integration. 

Simulation economics 

The above discussion indicates there is no technical 
advantage to be gained by using a hybrid rather than 
an all-digital simulation to solve a kinetics problem 
with the KDA package. Therefore, two questions of 
interest are : 

1. Is there an advantage to using me type of 
computer? 

2. How does one determine which computer to use 
for specific problems? 

The answer to the first question is there is an economic 
“break-even” point (see Figure 16) that governs the 
selection of a hybrid computer over a digital computer 
or vice versa. This “break-even” point is created when 
the simulation of the kinetic model requires the solu- 
tion of a set of differential equations and the digital 
cost per optimization run is in excess of the equivalent 
hybrid cost. 

A hybrid solution is practical when the hybrid eco- 
nomic advantage during the production phase of a 
kinetic study offsets and surpasses the deficit en- 
countered during the problem preparation phase. Re- 
calling previous discusions to perform a kinetic study 
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using the Kinetic Data Analysis package the analog 
programming task is superimposed on the normal prep- 
arations required for an all-digitai study. This creates 
an obvious hybrid deficit which combines with hybrid 
cost advantage during the execution of the optimiza- 
tion program to create an economic “break-even” point. 

The economics associated with the hybrid versus all- 
digita, question should be considered care.udy because 
sufficient savings can be realized by making the correct 
decision. For example, a recent hybrid versus ail- 
digital economic study for a reactor control problem 1 
indicated that a large scale hybrid computer had ap- 
proximately a 20:1 time and 40:1 cost advantage over 
large scale, third generation digital computers (e.g. 
$1,200 per hour computation center rate), and a 60:1 
hybrid time advantage for the solution of the “Mon- 
santo Benchmark Problem” has been reported in the 
literature. 6 

The hybrid cost advantage is directly related to the 
average computation time required to simulate a data 
set or experiment. The analog computer, typicady 
requires 10-20 milliseconds to simulate one data set, 
which is independent of problem complexity. The time 
required for the equivalent digital simulation is a 
function of the speed of the digita computer, the 
number of equations, their degree of nonlinearity, and 
the integration algorithm. The influence of the digital 
iitegration algorithm on this situation is mLor since 
the analog compute can be “speeded-up” more readily 
than the algorithm. 

The answer to the question of how one determines 
the answer to the all-digital or hybrid question is very 
difficult due to lack of information. However, based on 
information obtained from several hybrid optimization 
studies performed on EAI 8900 Hybrid Computers, it 
was possible to derive some “rules-of-thumb” or guide- 
lines. These relationships, which are based on a variety 
of studies involving up to twenty-six optimization 
variables, are admittedly crude. 

The time required to execute one hybrid optimiza- 
tion run, including detailed printouts and tape manipu- 
lation, can be estimated using: 

T h a* 3-NOV-NDS/lOO (15) 

where 

T/? = time per hybrid optimization run, minutes 
NOV = total number of optimization variables 
NDS = total number of data sets 


An approximate relationship to determine the 
equivalent time, Tz>, for a digital optimization is: 

T d a* NOV-NDS-DST/1500 (16) 

where DST is the average number of milliseconds re- 
quired to simulate one data set. This relationship does 
not include the time required for on-line I/O operations, 
which are not important if a competitive hybrid/digital 
situation exists. 

A crude economic plot, see Figure 16, may be ob- 
tained from the equations : 

C„ = eg + (R»-T* + Cf) NOR (17) 

and 

C c = eg + (R d T d + Cl) NOR (18) 

where 

Car, C D = total hybrid and digital simulation costs 

C h, C£ = estimated hybrid and digital preparation 
costs 

R H , Rz> = hybrid and digital computer rates 

Ch , Cp = engineering costs per optimization run 
NOR = estimated number of optimization runs 

The engineering costs associated with the execution 
and analysis of the optimization runs, Cf and Cf, are 
not necessarily identical. For example, in the illus- 
trative problem, four sets of four hybrid optimization 
runs (NOR = 16) were required and the engineering 
effort was four man days. An all-digital study could 
have required as long as, say, sixteen days to execute 
on a “slow” digital computer and required, say, eight 
man days of engineering. 

The application of the above mentioned economic 
analysis to the “Monsanto Benchmark Problem” in- 
dicated that the “break-even” point was slightly less 
than thirteen optimization runs. Since the problem 
solution required sixteen optimization runs, the 
economics were only slightly in favor of a hybrid solu- 
tion. However, a significant hybrid advantage was 
indicated if additional work was required. For example 
investigation of alternative mathematical models or 
analysis of additional experimental data. 

CONCLUSIONS AND COMMENTS 

The present version of the Kinetic Data Analysis 
package has, based on limited customer utilization in 
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EAI Computation Centers, proven to be both an effi- 
cient and an economic means of performing both hybrid 
and all-digital studies. For example, the time required 
to obtain the all-digital optimization program has been 
one man-day or less for small- to medium-sized Kinetic 
Data Analysis studies. 

Of greater significance, however, is the fact that this 
work has proven the practicality of hybrid applications 
software. It can be used as an effective tool to solve 
frequently oocurring problems on a routine basis with 
significant reductions in cost and problem preparation 
time. Therefore, the development of general purpose 
packages to solve specific classes of problems on hybrid 
computers would seem to be a fruitful area for future 
work. 


APPENDIX A 

Derivation of alternative reaction rate constant equation 
Defining the Arrhenius equation as: 

K = A- EXP (— B/T) (1) 


and 


' TT _ ' 'T. T / 


K 

K* 7 ' T, T 

which can then be combined to obtain 
K = K R (K HI f 

and 


(7) 


( 8 ) 


0 = (1/T, - 1/T)/ (1/Tx, - 1/Th) 


(9) 


Note that the range of 0, based on equation 2, is 
zb and the original Arrhenius coefficients in terms 
of K* and K H l are: 


B - LN(Khl)/1/T l - 1/Th) (10) 

and 

A = K* • EXP (+ B/T.) (11) 


and a mid-range absolute temperature as 
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1/T* = (1/Th + 1./TD/2 (2) 

where 

A, B = Arrhenius coefficients 
Th — Maximum absolute temperature 
T l = Minimum absolute temperature 
T* = Reference absolute temperature 
T = Absolute temperature, T L < T < Th 
K = Reaction rate constant 

one obtains: 

Kh = A- EXP (-B/T h) (3) 

K l = A- EXP (-B/T l ) (4) 

K* = A- EXP ( — B/T*) (5) 

Combining equations 3 and 4 and equations 1 and 5 
yields : 


LN(Khl) = ln( — ) = b( — 
v k l / v t l 



(6) 
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The extended space technique for 
hybird computer solution of partial 
differential equations* 


by DONALD J. NEWMAN and JON C. STRAUSS 

Carnegie-Mellon University 
Pittsburgh, Pennsylvania 


INTRODUCTION 

The rapid solution of partial differential equations 
(PDE) has been a subject of increasing interest in 
recent years. This interest in partly due to advances 
in areas of technology which require the solution of 
PDEs, but is primarily due to the need to apply modern 
optimization and identification techniques to the 
spatially continuous systems that are best modeled 
by PDEs. The parallel organization of the analog 
subsection of a hybrid computer facilitates extremely 
rapid solutions of complicated systems of ordinary 
differential equations (ODEs). Therefore, techniques 
to find a system of ODEs that can be solved to obtain 
a rapid approximate solution to a PDE on the hybrid 
computer have become the subject of intensive investi- 
gation. 

As digital computers have become faster and their 
memories larger, interest in symbol manipulation tech- 
niques has also increased, and advances have been made 
in the capabilities of computers to perform manipula- 
tive tasks once considered impracticallv large. The 
Galerkin technique for transforming a PDE into a 
system of ODEs has been known for some time, but 
for more than a crude solution of simple, linear 
problems, the quantity of algebra is so large that until 
recently this method has not been considered as a 


* This work was supported by National Science Foundation 
Grant No. GJ-179. This paper was abstracted from the dis- 
sertation of D. J. Newman, 9 submitted in partial fulfillment of 
the requirements for a Ph.D. in Electrical Engineering from 
Carnegie-Mellon University. 


practical technique. However, the technolgoy has 
progressed to the point where the large quantity of 
algebra no longer prevents accurate solutions of both 
linear and nonlinear problems. 

The Galerkin method employs an assumed solution 
consisting of a sum of time weighted spatial funtions; 
this separable form is similar to that used in the analyt- 
ical technique for solution of linear PDEs commonly 
known as the separation of variables method. Each 
spatial function in the separable form is called a mode, 
and these modes are assumed to be known functions 
selected to satisfy the boundary conditions. The Galer- 
kin method yields one ODE for each mode; the solu- 
tion of the resulting system of ODEs yields the time 
varying weighting coefficients of the modes. 

Recent investigation of the errors in assumed many- 
mode solutions of PDEs has led to the discovery that, 
while for the first few modes the Galerkin method is 
very effective, its performance for many-mode solutions 
is not satisfactory. The Galerkin method with small 
numbers of modes has been demonstrated to give more 
accurate solutions than other methods for the same 
number of ODEs . 1 If even more accurate solutions 
are required, more modes can be introduced into the 
solution, but the Galerkin method fails to produce re- 
sults with any significant increase in accuracy for 
these multi-mode solutions. Although the Galerkin 
method has been shown to be convergent , 2 advances 
in symbol manipulation capability have shown that 
the method is limited in accuracy in practice by the 
extremely slow rate of convergence. Therefore, a new 
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technique that is effective for multi-mode solution is 
needed. 

In this paper, a technique designed to meet this 
need, the extended space technique, is described and 
demonstrated. After a description of the PDE and 
the notion of assumed modes, a review of the Galerkin 
method introduces a thorough tutorial on the nature of 
the approximation errors. The linear problem with 
polynomial modes is used to further explain the slow 
convergence of the Galerkin method and to explain 
how the extended space technique overcomes this 
defect. Formal notation is introduced to make the 
technique applicable to the nonlinear problem. Fi- 
nally an example problem is presented with compara- 
tive results based on an analytic solution. 

A review and comparison of other hybrid methods 
is presented in a previous paper by the authors. 3 A 
more thorough explanation of the Galerkin method 
and its relationship to other assumed mode methods 
is available froip a review article by Finlayson and 
Scriven, 4 the Ph.D. dissertation of D. J. Newman, 9 and 
a recent tutorial article by R. Vichnevetsky . 10 

Nonlinear partial differential equation 

The form of the PDE of interest is; given in (1) where 
u(x,t) is the dependent function of independent vari- 
ables x and t, P is a nonlinear partial differential 
operator with respect to x, and f is a forcing function. 

~ u(x, t) = P[u(x, t)] + f(x, t) (1) 

The solution to this problem must satisfy an initial 
condition in t and homogeneous boundary conditions 
in x on the interval [0, 1]. (The [0, 1] interval is chosen 
for notational convenience only; the solution so ob- 
tained may be scaled to any other interval. Brackets 
are used to denote 1 ‘ operates on,” and parentheses 
are used to denote that the value “depends on.”) Thus 
(1) is an initial value problem in t, and retention of 
this initial value character in the system of ODEs to 
be obtained is desirable. The PDE form given in (1) 
appears to include only a limited number of PDEs, 
but through proper problem formulation a wide class 
of problems can be solved by simultaneous solution 
of PDEs of this form. 

Assumed modes 

An approximate solution v(x,t) to (1) is proposed 
in the separable form of (2) . 


v(x, t) = 2 c<(t) h<(x) (2) 

1 

The assumed spatial modes h*(x) are preselected 
to satisfy the orthogonality conditions of (3) and the 
spatial homogeneous boundary conditions on the so- 
lution to (1). 

i (0 i ^ j 

h»(x) hy(x) dx = < (3) 

0 (INI i = j 

Since the boundary conditions are homogeneous, v(x,t) 
also satisfies the spatial boundary conditions. A pre- 
vious paper by the authors 3 removes the restriction 
to homogeneous boundary conditions, but it is re- 
tained in this paper to simplify the presentation. The 
The Ci(t) functions are weighting functions for the 
assumed modes. 

Subject to the conditions stated above, the selection 
of the modes depends on the problem knowledge of 
the solution, and computational convenience. If 
specific regions of the space differ in such a way that 
the solution has different characteristics there or very 
high accuracy is required, the problem should be sub- 
divided into regions. The algebra for each region is a 
separate problem, but the resulting ODE systems are 
interdependent. A description of the regionalization 
problem is presented in Reference 9. 

The Ci(t) functions must be determined to give as 
nearly as possible the best solution to the PDE in 
(1) for the given modes of (2). The best approximation 
to the solution is one which matches the modal ex- 
pansion of the exact solution u(x,t) for each mode in 
the approximate solution v(x,t). If u(x,t) is replaced in 
(1) by v(x,t), a residual function R(x,t) must be intro- 
duced to preserve the equality as shown in (4). 

S h 4°‘- p [s <! - h -] +f+E (4) 

The approximate solution v(x,t) is an exact solution 
to equation (4), but the intent is to solve equation (1) 
which differs from (4) by an additional forcing function 
R. Analyzing the difference between u(x,t) and v(x,t) 
is equivalent to analyzing the effect of adding the 
residual function R to the PDE. 

Galerkin’ s approach 

The residual R in (4) is determined by the choice 
of the weighting functions c*(t) in the approximate 
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solution v(x,t). Galerkin suggested in 1915 6 an approxi- 
mation method based on orthogonalizing the residual 
with respect to the assumed modes; this orthogonality 
requirement is described by the n equations in (5). 

f R(x, t) hi(x) dx = 0 i = 1, 2. . .n (5) 
J o 

Galerkin does not give any justification for this method 
except to say that it is related to the work of Ritz. 6 
However, in addition to the strong intuitive appeal, it 
is easily shown that the orthogonality condition of 
(5) can be obtained by minimizing the integral of the 
residual squared with respect to the time derivatives 
of the c;(t). This strong relationship to the variational 
methods of Ritz has led some investigators to refer to 
this method as the Ritz, Galerkin method. 

Substituting (4) into (5) and employing the orthog- 
onality conditions in (3) yields the ODEs in c 4 -(t) given 
by (6). 


l|1,,l| E c, -/.1 p [S 0,h, ] + f 


hidx (6) 


In this paper, the C;(0) are chosen to give a least squares 
fit of v(x,0) in (2) to the initial condition on u(x,t) in 
(1). Thus the c*(t) functions are determined from ODE 
initial value problems. 

This approach can be generalized to any number of 
spatial variables as shown by Stacey. 7 


Where does the residual go? 

To be sure, the residual does not vanish for most 
PDEs and most finite sets of modes. The expressions 
in (5) ensure that the residual is orthogonal to the 
h*(x) functions, hence the residual is not composed 
of the modes that are in the approximate solution to 
the problem. However, since the h^(x) must satisfy the 
boundary conditions, they do not form an appropriate 
basis for R and hence determination of c*(t) as in (6) 
does not minimize the residual in the most appropriate 
subspace. 

A more useful form for investigating the residual is 
easily obtained by solving (4) for R and combining 
with (6) to obtain (7). 


R<x,t) Srit.il/ 1 p [ S c,h ‘] 

[£•**] 


hidx — P 


f (7) 


An analysis of (7) reveals that the residual must come 
from those parts of P[v(x,t)] and f (x,t) which are orthog- 
onal to hi (x). The conclusion is that R acts as a 
forcing function composed of components of P[v(x,t)] 
and f(x,t) that are orthogonal to hi(x). 

Is this effect good or bad? 

With respect to f(x,t) even if the effect is not good 
at least the effect can be evaluated in terms of the 
physical problem. In short, f(x,t) might as well be 
assumed to be a function described by (2), and if certain 
properties of f must be considered in problem, modes 
characterizing these properties may be carried in the 
solution. This is quite tenable if f(x,t) obeys the bound- 
ary conditions, and equally impossible if f(x,t) does 
not. 

With respect to P[v(x,t)], the effect is not immedi- 
ately clear in terms of the physical problem. For modes 
which are not themselves solutions of the unforced 
problem (not natural modes), the effective forcing 
function contributed by R with components that are 
not in the solution can have effects on the solution. 
In the Galerkin method these effects emerge as errors 
in the approximate solution v(x,t) in addition to the 
error due to the omission of modes that are in u(x,t). 
These errors are caused by errors in the c.*(t) functions 
and do not disappear very rapidly when more modes are 
added such as may be done for f (x, t) . 

Evidently the effects of the residual on the solution 
can be quite pronounced when mode and nonmode 
functions interact as may happen if nonnatural modes 
are employed. 


A special case 

The two-point boundary value problem with a 
second order linear PDE is a meaningful case to study. 
Since the object of this section is to examine the na- 
ture of the residual generated by the Galerkin method, 
the discussion is made more clear by assuming f(x,t) = 
0 and by employing simple polynomial modes. 

The modes for this two-point problem are required 
to satisfy the condition that h*(x) equal zero at the 
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ends of the solution interval [0,1] for i = 1, 2, 3 ...,n. 
Actually more general conditions involving derivatives 
of h;(x) can be used as shown later in an example 
problem, and a still wider class of boundary conditions 
can be used as described in Reference 3. However, 
these conditions are simple and serve to demonstrate 
the principles involved. 

The simplest polynomial modes that satisfy these 
boundary conditions are given in (8) . 

h*(x) = x*(x - 1) i = 1, 2* • •, n (8) 

The bar on the ht(x) indicates that these functions are 
not orthogonal, but they are independent. The n 
orthogonal functions h*(x) defined in (3) are readily 
generated from the h<(x) by the Gram-Schmidt pro- 
cedure. 

In order to determine the composition of the re- 
sidual, P[h t (x)] must be examined to determine which 
components are orthogonal to all of the h 4 -(x). For this 
purpose, ht(x) is an adequate substitute for h*(x) and 
considerably simplifies the discussion. Since P is a 
linear combination of derivative operators, P[h*(x)] 
could not contain any powers of x greater than i + 1 
but could have any lower term including a constant 
term. In fact an adequate basis for P[h*(x)] includes 
in addition to the h;(x) functions two functions 1 
and x that do not satisfy the boundary conditions. 
Therefore, the residual must be composed of a linear 
combination of 1 and x, and the Galerkin solution 
for this special case has an effective forcing function 
of the form ax + b. 

Introducing such an extraneous function or al- 
ternatively ignoring such a function if it were part of 
f(x,t) does not seem to be reasonable. Ostensibly the 
residual in the PDE is due to the omission of modes of 
higher degree from the approximate solution; however, 
such a residual would not be a function ax + b but 
would contain all modes especially those of the highest 
degree included in the solution. 

The extended space technique for the special case 

This technique extends the space of functions being 
considered for the solution of the second order linear 
PDE to include functions, h„ + i(x) and h„ +2 (x), which 
are used to absorb the residual arid reduce the error 
in the coefficients c f -(t); however, these functions are 
not included in the actual approximate solution v(x,t). 
In the extended space technique, the residual is not 
part of P[v(x,t)]; instead, the residual consists of 
functions that are not part of the approximate solu- 
tion and cannot be generated in the PDE from the 


approximate solution. The addition of sufficient 
amounts of h n+1 (x) and h„ +2 (x) to remove the ax + b 
component from the residual reduces the error in the 
coefficients for the modes. 

The expression given in (9) is substituted into the 
PDE instead of the approximate solution v(x,t) to 
generate the extended space residual R,(x,t). 

22 h *'( x ) c<(t) = v(x, t) 

*~ l + h„+i(x) c„ + i(t) + h„ +2 (x) c n+2 (t) (9 ) 

Two functions which are orthogonal to the hi(x), 
i = 1, 2, . . . , n + 2 can be found from 1 and x and are 
denoted gj(x) and g£(x). These functions with h*(x), 
i = 1, 2...n form a basis for P[v(x, t)]. The g°(x) 
functions are employed in (10a) to give two linear 
algebraic equations which when solved simultaneously 
with the n linear ODEs in (10b) determine the co- 
efficients c*(t) in v(x, t). 

f g$(x) R«(x, t) dx = 0 j = 1, 2 (10a) 

•*0 

f h i(x) R e (x, t) dx = 0 i = 1, 2, • • •, n (10b) 

•'o 

The two equations in (10a) insure that the residual 
will not have 1 and x as a basis. The equations in 
(10b) are essentially the same as those in (5) and insure 
that the residual is orthogonal to the modes. The 
conditions in (10) are necessary for the minimization 
of the integral R squared in the subspace with basis 
gj, § 2 > hi (i=l,...,n). It has been demonstrated that 
this is a more appropriate subspace for the description 
of R than that with hi (i=l,...,n) alone as a basis. It 
should therefore be expected that the extended space 
technique give better results than the Galerkin method , 

A close examination of P[u(x,t)] (u(x,t) is the exact 
solution) compared to P[v(x,t)] indicates why the 
extended space technique does give better results. 
P[u(x,t)] can be broken into three important parts: 
P[v(x,t)] is one part, a part which has the same basis 
as P[v(x,t)] but is generated by functions in u(x,t) 
that are not in v(x,t) is a second part, and a part 
which has a basis different from P[v(x,t)] is a third., 
Because the third part has no effect on (10), it cannot 
cause any error in the coefficients c*(t), but the second 
part can. Because the second part is generated by 
functions not in v(x,t), it does not appear in (4). Ideally 
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the residual should be this second part, but since the 
second part is functionally indistinguishable from the 
first part, the ideal residual cannot be produced. The 
extended space technique alleviates the errors caused 
by the absence of the second part for two reasons: 
(1) the extension functions h n+1 (x) and h w+2 (x) do 
generate some of the second part; (2) since the residual 
is composed of these extension functions, the effective 
forcing function is not composed of only the gj(x) and 
gl(x) which should have been cancelled out of (4) by 
the second part. Particularly in this special case, linear 
P with polynominal modes where the greatest inter- 
action is between adjacent modes, the majority of the 
effect of the second part of P[u(x,t)] is absorbed by 
these two mechanisms. 

A generalization of the technique 

The extended space technique can be generalized 
to cover a nonlinear PDE with m boundary conditions 
where the solution employs nonpolynomial modes. 
Unfortunately, the effect of the technique on the 
residual and the error in the coefficients cannot be 
readily examined under these general conditions. 



In order to proceed with the description, a more 
general notation is required: G is the set of functions 
that are desired as modes and functions to fit the 
forcing function, f(x,t). G has an orthogonal basia of 
n + m functions denoted g»(x). H w is a subset of G 
such that all functions in H n satisfy the m boundary 
conditions. H is an extension of II M outside of G, but 
all of the functions in H also satisfy the boundary 
conditions. H also has an orthogonal basis of n + m 
functions denoted h,(x), and the first n of these fun- 
tions are in H w . In addition m functions denoted g°(x) 
are defined to be orthogonal to all functions in H and 
along with the h,(x) in H n form a basis for G. The 
relationship of these sets of functions is shown pic- 
torially in Figure 1. 

The approximate solution retains the form given in 
(2), but the residual R € is given by (11). 

n+m j p n+m ~| 

5, hi at - p [ ^ M '] + t + R - (11) 

The system of equations that are solved to determine 
c i (t) are given by (12) and are derived from orthogo- 
nality conditions as in (10). 

°‘({ p [s h,o, ] +f H <ix 

j = 1, 2,- • •, m (12a) 

j = 1, 2-..n (12b) 

Equations (12a) are a nonlinear algebraic system and 
(12b) are a nonlinear ODE system. 

A linear PDE problem 

The study of heat transfer within a solid is an 
interesting problem in connection with this work 
because the surface conditions give rise to a two-point 
boundary value problem. Problems of this nature are 
encountered in heat exchangers where metallic fins 
are cooled by a forced flow of a fluid. In this example 
problem, the linear diffusion equation shown in (13) 
is used to represent one dimensional heat flow within 
the metal fin. 

d 2 u(x, t) du(x, t) 

k dx 2_ = M dt 


Figure 1 — Function spaces 


(13) 
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In (13) u(x,t) is the temperature, k the conductivity 
and g the heat capacity of the metal. 

Newton’s law of cooling shown in (14) is used for the 
boundary condition at the fin surface cooled with the 
fluid at temperature s(t). 

du(x, t) 

— — — * a(u(x, t) - s(t)) (14) 

This problem is a linear PDE problem with a linear 
differential boundary condition. 

The problem is chosen as an example because it 
has an analytical solution. The solution is evaluated 
and used for comparisons of the accuracy of two, three 
and five mode assumed mode solutions employing the 
Galerkin technique and with two and three mode 
solutions employing the extended space technique. 


ton’s law of cooling which requires that the rate of 
transfer of energy through a boundary be proportional 
to the temperature difference across that boundary. 
The rate of transfer is proportional to the derivative 
of temperature within the metal at the surface. The 
difference in temperature across the boundary is the 
difference between the temperature within the metal 
at the surface and the waiter temperature, a function 
of time s(t). In (14), the proportionality constant a is 
assumed to be equal to one for simplicity, and s(t) 
is assumed to be the step function given in (15) which 
is chosen so that the problem will have a simple analytic 
solution. 


>s(t) 


100 t = 0 
0 t > 0 


(15) 


A metal fin 

The example problem deals with a fin of metal 
uniform in thickness which is cooled by w-ater on both 
sides as shown in Figure 2. The initial temperature 
(100°) is uniform throughout the cross section, and 
cooler water (0°) begins to circulate by the fin at time 
zero. The problem has symmetry so that only half 
the fin must be considered in the problem. 

The water-metal surface is assumed to obey New - 



While the surface provides one boundary condition, 
symmetry provides another since the derivative of 
temperature must be zero on the axis of symmetry. 
The complete PDE problem is given in (16) wdiere 
k = 1/10 and g = 1. 


I d 2 u(x, t) _ du(x, t) 
TO dx 2 dt 


du(x, t) 
dx 


= 0 

x-1 


(16) 




= 0 


x= 0 

u(x, 0) - 100 


The analytical solution to this problem is found by 
classical separation of variables, is quite complicated, 
and is not harmonic in nature. The frequencies of the 
sine components vary according to the solutions of 
o) = a cot(co) . In a sample problem given bv Lebedev, 
SkaTSkaya and Uflyand, 8 an answer is given winch is 
presumably an exact answer for this problem. Actually 
their expression is a close approximation to the exact 
solution with an accuracy of better than .01 percent 
fora = 1. 


Modes and the ODE system 

The modes for this problem are chosen by application 
of the method for homogeneous differential boundary 
conditions presented in Reference 3. A simple poly- 
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nomial family g<(x) = ix — x i is employed as the set 
G which satisfies gj.(l) = 0. The modes are integrated 
with the boundary condition at x = 0 applied to deter- 
mine the integation constant, and the R*(x) shown in 
(17) are produced. 

hi(x) = (i + 1) (x + 1) - x* 1 i = 1,2. . .n (17) 

The modes h»(x) are obtained by orthogonalization 
of R 4 (x) and are shown in Table I. 

TABLE I — Orthogonal modes for linear PDE problem 

hi(x) = - X T2 + 2X + 2 

h 2 (x) = - X T3 + (691/432)X |2 - (43/216)X 

- (43/216) 

h,(x) = - X |4 + (7932/3905)X |3 - (58089/ 

54670)X 1 2 + (857/27335)X 
+ (857/27335) 

h 4 (x) = - X T5 + (148725/59152)X f 4 

- (30469/14788)X f3 
+ (200593/354912)X | 2 

- (1129/177456)X 

- (1129/177456) 

h 8 (x) = - X |6 + (432358/143745)X |5 

- (31431/9583)X 

+ (2431444/1581195)X |3 

- (1266961/4743585)X f 2 
+ (6746/4743585)X 

+ (6746/4743585) 

The ODE systems for two, three and five modes 
employing the Galerkin technique are obtained by the 
application of equation (6). The derivatives of c a (t) 
are linear functions of the c»(t), and the coefficient 
matrices of the equations are given in Table II, The 
ODE systems for two and three modes employing the 
extended space technique are obtained by substitution 
of (12a) into (12b) to eliminate the highest two c,(t) 
Again the derivative functions are linear, and the coef- 
ficient matrices are given in Table III. 


TABLE III- 

Coefficient matrices for extended 
space technique 

-.0740726 

.000426177 


.142572 

- 1.174004 




-.0740739 

.000427785 

- 1 . 799 10 -5 

.1431100 

- 1,174666 

.0741377 

-.301688 

.371627 

- 4.15921 


Numerical results 

The simple boundary condition employed in this 
problem to facilitate obtaining an analytical solution 
presents some severe difficulties in obtaining a good 
fit to the initial condition. The modes must satisfy this 
unrealistic boundary condition which imposes a steep 
slope at x = 0 where the initial condition is flat. 
Figure 3 shows the solution fit to the initial condition 
for two, three and five modes for both techniques. 
Even at five modes the fit is not entirely satisfactory; 
however, for small numbers of modes, the analytical 
solution suffers from the same defect. This is the cost 
that must be paid to obtain an analytical solution for 
comparison. 

Figure 4 shows the solution at times of five, 20 and 
100 seconds for both techniques. The solutions, exact, 
two, three and five mode are indistinguishable on a 
graph of this scale. The hybrid solution also produces 
identical results and the analog block diagram for 
this problem with three modes is shown in Figure 5. 


.074074 
.14341 
- .30674 
.71994 
-1.8134 


TABLE II — Coefficient matrices for Galerkin method 


4. 2867 io — 4 1.8292 10 — 5 

-1.1769 7.7397 w -3 

.38796 -19.084 

-5.6258 .59565 

2.6283 -24.223 


1.4610io-6 
— 3.8192 10 — 3 
.020270 
-9.4932 
.79036 


- 1 .5616m - 7 
7.5717,o-5 

- .034980 
.033540 

-17.439 
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Figure 3 — Linear PDE problem T = 0 



Figure 4 — Linear PDE problem solution 



Figure 5— Linear PDE problem analog diagram 

In order to compare the accuracies of the different 
solutions, errors for cross sections at five seconds are 
chosen because the five second; cross section has the 
greatest error and because at five seconds the analytic 
solution is sufficiently convergent to give an accurate 
basis for comparison. Figure 6 shows the error curves 
on a greatly magnified scale (full scale is .15 to .2 per- 



Figure 6 — Galerkin method error T = 5.0 


cent of the solution) for the three assumed mode solu- 
tions using the Galerkin method. The improvement 
between the two and three mode solution is substantial 
but the five mode solution is disappointingly similar 
to the three mode solution. The error does not decrease 
nor does it change shape. Since in a five mode solution 
only the seventh and higher degree polynomials are 
excluded from the solution, the logical conclusion would 
be that the error would have three maxima and three 
minima; but since it does not, possibly an error has 
crept into a lower mode which is not diminishing to 
zero very rapidly. The analysis performed previously 
indicates that this is in fact the case and that even 
though this error does slowly diminish as the number 
of modes increases, the Galerkin method on linear 
problems leaves all the error in the modes that are 
part of the solution. 

The extended space technique shows a dramatic 
improvement in the accuracy of the results for three 
modes. Figure 7 shows the error curves for the two 
and three mode solution; the three mode solution 
matches the analytic solution so well that five mode 
solution is not needed. The error shown for the three 
mode solution is so small that it is comparable to the 
errors in numerical integration of the ODE system 
and is only meaningful in the sense that it is a great 
improvement over the Galerkin method. 



Figure 7 — Extended space error T = 5.0 
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TABLE IV — Comparison of eigenvalues 


Analytic 

Galerkin 

2-Mode 3-Mode 5-Mode 

Extended Space 

2-Mode 3-Mode 

- .07402 

- .07402 - .07403 - .07403 

- .07402 - .07402 

-1.1734 

-1.1770 -1.1768 -1.1742 

-1.1741 -1.1738 

-4.1439 

-19.084 -9.4916 

-4,1601 

-9.0810 

-17.028 


-16.000 

-19.499 



TABLE V — Comparison of digital computation times 

Galerkin Extended Space 



Algebra 

Integration* 

Algebra** 

Integration* 

2-Mode 

11 

6 

98 

6 

3 -Mode 

34 

7 

234 

7 

5-Mode 

209 

10 




Times in seconds for IBM-360/65 

* Does not include the time for the 
compilation of ODE derivative sub- 
program requiring about 30 seconds. 

** Values corrected to remove estimated 
program compilation time which was 
not included in other timings . 

The eigenvalues for the various solutions shown in 
Table IV indicate why the extended space technique 
produces such accurate results. For all solutions the 
first two eigenvalues match the eigenvalues obtained 
from the exact solution very well. The third eigenvalue 
for the Galerkin method is never very near the exact 
value even for five modes; however, the extended space 
technique produces an eigenvalue very near the exact 
solution with only three modes. In fact the extended 
space technique produces a much better eigenvalue 
for three modes than the Galerkin method does for 
five modes. Table V presents a comparison of the 
digital computation times to do the algebra necessary 
to prepare the ODE systems and times to do the 
numerical integration of the systems for the Galerkin 
method and the extended space technique. Compu- 
tation time on the hybrid computer to solve the ODE 
system is the same for all cases and may be as small 
as 10 milliseconds on the Carnegie-Mellon University 
EAI-680/PDP-9 hybrid computer depending on the 
I/O device used to monitor the solution. 


CONCLUSIONS 

The example problem has demonstrated how much of 
an improvement the extended space technique can 
be over the classical Galerkin method. Both the ac- 
curacy of the solution and the eigenvalues of the ODE 
system are better for the three mode extended space 
technique solution. However, this improvement is not 
obtained without some increased cost: the quantity of 
algebra that must be performed to determine the 
three mode extended space solution is about equal to 
the quantity to determine the five mode Galerkin 
solution. Even when this increased cost is considered, 
the extended space technique is superior because the 
three mode solution is better than the Galerkin five 
mode solution. 

This technique is also applicable to nonlinear prob- 
lems, but no experimental results are available at 
present. The nonlinear application has an additional 
complication: the simultaneous solution of nonlinear 
algebraic equations and a nonlinear ODE system is 
required. Work on a nonlinear problem is currently 
being done and results are expected to indicate com- 
parable superiority over the Galerkin method for 
nonlinear problems. 
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INTRODUCTION 

When compared to continuous (analog) computation* 
hybrid computation is subject to two sources of error 
not associated with hardware, but caused by its 
logical nature. They are often referred to as the time 
(or transport) delay, and the reconstruction errors. 

This time delay error is caused by the time taken for 
the digital computer to process the data sampled from 
the analog computer, before sending the updated 
results back to the analog. The reconstruction error 
results from the hold action of the digital-to-analog 
link: the updated value from the digital is sent to 
the analog and held fixed until the next updating, in- 
stead of being updated continuously. 

The effect of these errors on the hybrid solution (as 
compared with a pure analog solution) is twofold. 
First, inaccuracies are introduced. Second, the hybrid 
solution may become instable and grow without bound, 
even though the correct solution is bounded or even 
decreases to zero. 

To prevent instability and minimize error, hybrid 
computations utilize compensation techniques. The 
variables processed in the digital computer for use in 
the analog computer are calculated at some future time, 
by an extrapolation scheme, before being sent to the 
analog. Depending on the scheme used, this technique 
can have a beneficial effect on the accuracy and stability 
of the solution, for a given sampling interval. 

There are a number of extrapolation techniques 
commonly used to achieve compensation. One such 


technique is that of multistep extrapolation, or digital 
filters, in which values of the variables at earlier time 
are used for extrapolation. A good discussion of this 
method is given by Mitchell . 1 He demonstrates its 
shortcomings for heavily damped systems, caused by 
the instability of the extraneous solutions introduced 
by use of values at earlier times. For each step back 
in time, one extraneous solution is introduced, and 
these solutions are instable for large enough sampling 
intervals. The popular three-step, or parabolic, extrap- 
olation introduces two such solutions, and their 
amplitude increases with increasing damping, so that 
heavily damped systems require small sampling inter- 
vals for stability. 

Some years ago, Miura and Iwata 2 suggested another 
technique of extrapolation. For solving differential 
equations, they used the derivative of each variable 
to extrapolate, rather in the manner of a Taylor series. 
The implementation suggested was to add to the out- 
put of an integrator a multiple of the input, the sum 
being the extrapolated value of the variable. Further 
use of this scheme, for undamped systems, was made 
by Gilbert* and Karplus 4 * 5 with several implementations 
suggested. Gilbert 3 analyzed the undamped system, 
using z-transforms. This extrapolation technique has 
the advantage of requiring either no backward steps, 
or only one, depending on the implementation, thus 
eliminating or reducing the number of extraneous 
solutions introduced. The result is a solution which is 
not only more accurate than the uncompensated 
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hybrid solution, but can be more stable. This is in 
contrast to the use of multi-step methods, which im- 
prove the accuracy but reduce the stability compared 
to the uncompensated hybrid solution. 

There is apparently only one published reference to 
the use of the method of Miura and Iwata for a damped 
second order system. Bekey and Karplus, 6 on pages 
382-383 of Chapter 12, give some results of unpublished* 
work of Howe and Fogarty. 6 In this work, they extrap- 
olate x and x by using 1.5 T times i and x respectively, 
where T is the sampling interval. They use an im- 
plementation where the extrapolation is performed in 
the analog computer, the extrapolated values are 
sampled by the digital computer, combined to give 
x, and then converted D to A and sent to the analog 
computer for integration. We can call this calcula- 
tion of extrapolated values in the analog computer 
analog compensation. The analysis by z-transforms is 
based on a timing sequence in which the A to D sam- 
pling occurs before the D to A conversion of x. The 
result of this compensation scheme is two desirable 
solutions which have exponents whose error are of or- 
der (coT) 2 , in contrast to error of order o>T for the un- 
compensated solution, where oj is the natural frequency. 
However, there are two extraneous solutions of the 
order (ftoT)*, where f is the damping coefficient, in 
contrast to the single extraneous solution of order 
for no compensation. Therefore, we see that in this 
case derivative compensation improves the accuracy, 
but it reduces the stability, compared to no compensa- 
tion. 

This situation can be improved if we change to 
what might be called digital compensation. Here, we 
sample x and x, and do the extrapolations in the digital 
computer. This is the scheme used in the present re- 
port. For a damped system, it uses no backward time 
steps, instead of the one backward step inherent in 
the Howe-Fogarty implementation. Therefore, it has 
only one extraneous solution, of order f wT, and is some- 
what more stable than the uncompensated case because 
of a better numerical factor. The accuracy of the two 
desirable solutions is of the same order as those of 
Howe and Fogarty. 

The same scheme as that given for digital compensa- 
tion in this report can be obtained by the analog com- 
pensation method of Howe and Fogarty if they change 
the order of A to D sampling and D to A conversion 
and perform D to A before A to D. This may not be a 


* Prof. Fogarty kindly sent me a copy of this report, and the 
remarks in this paragraph are based on my analysis of Section 5 
of the report. 


desirable implementation because the transients set 
up by D to. A may interfere with the values sampled 
A to D immediately thereafter. 

The purpose of this report is to extend the use of 
derivatives for extrapolation, to apply the method to a 
damped second order system typical of control prob- 
lems, to analyze the system by use of z-transforms, 
and to compare the analys's with hybrid calculations 
using both derivative compensation and multi-step 
compensation. 

The extension of the derivative method, which is 
also referred to as Taylor series compensation, is in 
several directions. First, we not only correct x by using 
x, but also by using x, since that derivative is also 
ava/lable- Second, we do not assume an extrapolation 
ahead by 1.5T, but carry along arbitrary constants 
which are then chosen to give greatest accuracy. The 
first order corrections are indeed found by this method 
to be 1.5T, providing a simple analytical derivation 
of this fact. The second order coefficient of x may be 
chosen in several ways toenhance accuracy or stability. 

The analysis is applied to a linear damped oscillator, 
forced by a control function which is a linear combina- 
tion of x and x. The oscillator is implemented on the 
analog computer, the control function on the digital 
computer. 

The z-transform analysis yields formulas which can be 
used to predict the stability of both the compensated and 
uncompensated cases for any values of the parameters 
and sampling interval. Similar results are given for the 
three-step compensation scheme, and show it to be 
less stable. 

A numerical test was made by implementing both 
schemes on a Beckman 2200/SDS 9300 hybrid com- 
puter. The hybrid calculations were compared with 
continuous calculations of the same system made on 
the analog computer. The superior accuracy and sta- 
bility of the Taylor series method over the three-step 
method is clearly apparent in the strip chart results, 
as well as in the digital printouts. 

Analysis 

Continuous solution 

The forced oscillator analyzed is defined by 


x X 2cofx + co 2 x — o) 2 8 

(2.1) 

8 = Kx c — K(r& + x) 

(2.2) 


where K and r are constant control parameters. The 
command input x c is taken to be a constant here, for 
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ease of analysis. Further, only the 'simple initial 
conditions x(0) = 0, x(0) = 0 are considered, although 
other values bring only algebraic complication. 

The exact continuous solution of this problem is 
simply obtained by transposing the variables on the 
right side and defining total frequency and damping by 

0) 2 t = o> 2 (l + K), 2 w r fr = 2cof + w 2 Kr (2.3) 
The solution with zero initial conditions is then 

Kx c r- 1 y if T K 

x - I 1 ( 1 ) e x n* 

1 + K L 2 V ' 

- ^ 1 + — ^ e X T>‘ J (2.4) 

f 1 t == (1 — f 2 t) 1 ^ 2 j Xn f2 = wr(- f x r ± ify 1 ) 

where the Xn ,2 are the roots of the characteristic 
equation 

X 2 t ~b 2co t f r X p -j- c o 2 p = X 2 r 

+ (2cof + a> 2 Kt)\ t + co 2 (l + K) = 0 (2.5) 

Hybrid difference Equations 

The hybrid implementation considers the 8 term as a 
control function which is calculated digitally while the 
left side of (2.1) is calculated continuously in the analog 
computer. Thus, between the sampling times nT and 
(n + 1)T, 8 is held fixed at the value 8 Pn supplied to the 
analog at t = nT. 

Therefore during this interval the analog solves 


x == Xi [ " ] e x i (*~ n T) 

+ \ 2 [ " ] e \-(*nT) (2.7a, b) 

where Xi, 2 are the roots of the free-vibration character- 
istic equation 

X 2 + 2oofX + co 2 = 0, 

Xi,2 = «(~f ± if 1 ), f 1 = (1 - f 2 ) 1/2 (2.8) 

At t == (n + 1)T these are expressible in real form as 

Xn+i = e- wrr [(x* - 8pn) (cos u^T + f/f* sin cof x T) 
+ x n /o) sin cof x T] + 8 P n (2.9a) 

x w+ i = e~^ T [x w (cos «f x T - f/f 1 sin wf x T) 

- - Spn) sin co^T] (2.9b) 

These two equations are difference relations between 
x n , x n and x n +i, x n+ i, with given 8 Pn . Equations (2.7) 
show that the analog computer produces segments of 
forced damped vibrations between sampling times, each 
joined to the adjacent segments with continuous x and x, 
but discontinuous x, because 8 Pn changes at each 
sampling time. The hybrid system solves the difference 
equations (2.9), as will we, but first 8 Pn must be specified 
in terms of x and x to model the digital part of the 
calculation. 

Taylor series compensation 

The digital calculation of 8 Pn , the value sent to the 
analog at time nT, can only depend on quantities 
sampled by the digital at previous sampling times. We 
will project x and x and take 8 Pn to be given by the 
projected values according to (2.2): 


x + 2wf* + o) 2 x = co 2 8 Pn (2.6a) 

with initial conditions 

X = X n X = Xn (2.6b) 

The solution of (2.6) is 

r* X 2 (x„ — 8p n ) — X n n 

X = e \(*-nT) 

L X* - Xi J 

r x n Xi(x n 8p n ) -j 

+ eV‘-»r> + 6 Pn 

L J 


8 Pn — Kx c - K (rx Pn + Xpn) (2.10) 

The projections are accomplished from x n -i, x w -i by a 
Taylor series form 

Xp n — x w — i + fTx n _i + kT 2 x w -i (2.11a) 

X Pn = Xn— 1 + hT Xn— 1 (2.11b) 

We have used as many terms as the available derivatives 
allow. The quantity x n _i can be sampled and made 
available in the digital. The second derivative is 
calculated from the differential equation (2.6a) 

X„_l = — 2cof Xn— 1 CO 2 X n _i + CO 2 5p,n— 1 (2.12) 
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Equation (2.10)-(2.12) are the essence of the Taylor 
series compensation scheme proposed here. In contrast, 
a three-step scheme would project 8 Pn from previous S’s: 

8 Pn = ao 8n-i + ai6 w _2 + a2 5 n _ 8 (2.13a) 

where 

8 n -i = Kx c — K(t x n -i + Xn— i) (2.13b) 

and similarly for 5 n __ 2 , 5 n _ 3 . This scheme goes back to 
(n — 3)T, two steps further than (2.11). 

In both cases the constants k, h, or a 0 , ai, a2 are 
available to help improve the solution. For the three- 
step method, it is conventional to project to the time 
(n + 1/2)T, for which the values of the constants are 

ai = —21/4, a 2 = 15/8, a 0 = 1— ai — a 2 = 35/8 

(2.14) 

If we project (2.11) the same distance, we find 

l = h = 3/2, k = 9/8 (2.15) 

Instead we will carry the constants along, and choose 
their values on the basis of the resulting formulas. 

The final form of 8 Pn comes by inserting (2.11) and 
(2.12) into (2.10) to obtain 

8 Pn = Kx c — K{x n _i (1 — hcorcoT — kco 2 T 2 ) 

— 5p (n _i (hcorcoT + kco 2 T 2 ) 

+ oT 1 X n ~i[<or + {l — 2fhcor)coT 

- 2fkco 2 T 2 ]} (2.16) 

We now have the three difference equations (2.9a) 
(2.9b) and (2.16) for the three unknowns x n , x n and 8 Pn . 
Their solution will provide the result of our model of the 
hybrid calculation. 

Solution by z-transform 

The z-transform provides a simple method of solving 
the difference equations. The definition of the z-trans- 
form of the sequence x„ is 

x* = E x„ Z-" (2.17) 

n^O 

and for our purposes its important property is 
00 

E x n+l Z-" = z(x* — x 0 ) (2.18) 

n=0 


The inversion of a z-transform follows easily by 
observing from the definition (2.17) that 

00 

2J&-1 x * = X , z **"”"* 1 
n— 0 

If this is looked upon as a Laurent expansion in the 
complex variable z the residue is the coefficient of the 
term for which n = k, which is x*. Thus the inversion 
of x* to find x n is accomplished by finding, for each n, 

Residue (z*” 1 x*) = x n (2.19) 

The stability of the solution is also indicated by 
(2.19) . Stability requires that x w not grow as n increases. 
The only factor in the residue which depends on n is z w , 
which grows or decreases with n depending on whether 
the absolute value of z is greater or less than unity. This 
leads to the well-known stability criterion that every 
root of the denominator of x* must have absolute value 
equal to or less than unity. 

The transformation of (2.9) and (2.16) is accomplished 
by multiplying by z~ n and z~ n+1 respectively, summing 
and using (2.17) and (2.18), remembering the initial 
conditions are zero. The result is 

(z — l)x* — i*(e“ a> ^ 7T /^f 1 ) sin cof*T 

+ (x* — S P *) fl — e~^ T (cos «i*T 

"1" f/f 1 sin cofiT)] = 0 (2.20a) 

x* [z — e^ T (cos upT — f/f l sin a>f x T)] 

+ (x* - 8 P *) co/f 1 e- wrr sin atfiT = 0 (2.20b) 

(z + K)x* + x*Kco“ 1 [cor + (l - 2fhcor)coT 
- 2fkco 2 T 2 ] - (x* - $p*) [z + K (hcorcoT 

+ kco 2 T 2 )] = z 2 Kx c / (z - 1) (2.20c) 

These equations have been arranged so the variables 
are the z-transforms x*, x*, and x* — 6 P *, and their 
solution gives the z-transforms of the problem variables, 
which must then be inverted to yield formulas for the 
actual solution. 

If the three equations are solved by determinants 
the denominator is given by the determinant of the 
coefficients, 
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A - - z[(z - l) 2 - 2 z(e^ T cos «f x T - 1) 

+ (e~^ T - 1)] + K{(z + lXe-^cosco^T - 1) 

- - 1) + (z - 1)1 f 1 e“ wfr sin cof x T 
[f — cor — (^ — fhtor)coT + fkco 2 T 2 ] 

- (z - l)[(z — 1) — (e” w ^ r cos cofiT - 1)] 

(hcorcoT + kco 2 T 2 ) } (2.21) 

This is a cubic in z, whose roots determine the solution 
through their residues, according to (2.19). 

The solution for x* is then 

z 2 Kx c 

x * = [( z + cos co^T - 1) 

(z - 1) A 

- (e“ 2W ^- 1) + (z - 1) f/f 1 e~^ T sin cof^T] (2.22) 

An additional root at z = 1 is visible here, whose residue 
also makes a contribution. 

Expansion of roots 

The nature of the roots of A can be seen by letting T 
approach zero in (2.21). Then all terms approach zero 
except the first, so one root must approach zero, the 
other two approach unity. The exact roots are com- 
plicated to find since (2.21) is cubic, but we can be 
satisfied with expansions of the roots in powers of coT. 

Let us first look for a root of the form: 

z = 1 + dcoT + eco 2 T 2 + f<o 8 T 8 + ••• (2.23) 

If the coefficients of (2.21) are also expanded in powers 
of coT, and (2.23) is inserted, setting the lowest two 
powers of coT to zero yields 

d 2 + (2f + Kr)d + (1 + K) = 0 (2.24) 

Kd[dcor(h - 3/2) + (/ - 3/2)] 

e » y 2 d 2 - — (2.25) 

2(d 03 t f r/co) 

These determine the first two coefficients in (2.23). The 
solution of (2.24) is 

di,2 = o) y( — fr ± i £ x t)/o> = Xn f 2/co (2.26) 

where co r , f T , X r are defined in (2.3) and (2.4). Thus the 
first coefficient is identical with the exponent of the 
exact solution. 


To see the significance of this, remember that the 
important term in the residue is z n which can be 
written exp(n in z). But z in the form (2.23) can be 
used to expand in z to yield 

z n = exp{ndcoT + n(e — d 2 /2)co 2 T 2 
+ n[f - d 3 /6 + d(e - d 2 /2)] <o 8 T 8 + * • • } (2.27) 

Thus the first term is part of the exact solution at 
t — nT, and subsequent terms are error terms. 

With two roots zi, z 2 given as a complex conjugate 
pair by (2.23)-(2.26), the third root is simple to find by 
dividing A by (z — zi) (z — z 2 ). The expanded result is, 
using (2.24) and (2.25), 

Zs (1 — h) Ko3T coT 

+ K[i - k - 1/2 + fcor(l - 2h) 

+ K(1 - h) co 2 r 2 ]co 2 T 2 + * * • (2.28) 

The solution is usually stable to the roots z i, z 2 
because the real part of d is negative, so the dominant 
term of z w is a damping. However, it may be unstable to 
z 8 , and will be for large enough coT. 

Before choosing values for the compensation param- 
eters, we will look at the actual solution generated by 
these roots. 

Solution in the physical (time) domain 

The solution is the sum of the residues of (z w_1 x*) 
at the poles z = 1, zj, z 2 , z 8 , with x* given by (2.22) . The 
residue at z = 1 is easily found by putting z = 1 into 
(z — 1) x*, which yields 

Residue (z = 1) = Kx c /(1 + K) (2.29) 

which is just the constant part of the exact solution 
(2.4). 

Since z i and z 2 are complex conjugates, so are their 
residues, and their sum is twice the real part of either. 
If the expansion (2.23) is put into (2.22) and (2.21), the 
result for zi to order coT is found to be 

zi n Kx c 

Residue (zi) = 

2(1 + K) 

i/?rCdT V if T ✓ PiO ) T V-| 

1 + J + — ( 1 jj (2.30) 

03 rf r/co fir co rf r/co 

ei — dl 2 / 2 = fir + ifii 
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Finally, the residue at z 3 is found similarly using (2.28) : 

Residue (z 3 ) = 2“ 1 (coT)"+ 3 [Kcor(l - h)]”+* (2.31) 

Choice of compensation constants 

Comparison of (2.27) and (2,29) with the exact 
solution (2.4) shows that the first deviation of both the 
z n factor, and the rest of the expression, depend on 
e — d 2 /2. If this term is zero, the deviation will then be 
0 (oj 2 T 2 ) in both places. And (2.31) shows that the 
contribution of the extraneous solution is of high order 
in JT and should decrease rapidly as long as |z 3 | < 1. 

These observations lead to the conclusion that we 
should make e — d 2 /2 vanish, which means, according 
to (2.25) , 

h = / = 3/2 (2.32) 

The coefficient k is not determined to this order. 
However, if e — d 2 /2 = 0 the next term in (2.27) is 
found from the expansion of (2.21) to be 

f - d 8 /6 = 

— Kd{d[k - 13(1 + d«r)/12] - 2Kcor(l + dwr)/3} 
2(d + <orfr/f) 

This cannot vanish for any choice of real k. One can 
make either its real part or its imaginary part vanish, 
although k will then depend on the parameters of the 
problem. One obvious choice which reduces the size 
of f — d 3 /6 is 

k = 13/12 (2.33) 

and this is the one used in the implementation. Further 
study would be needed to determine if another, more 
complicated, choice were better. 

Notice that the values given in (2.32) are exactly 
those shown in (2.15), which are obtained by projecting 
to (n + 1/2)T, while the k of (2.33) is only 1/24 smaller 
than the corresponding value of k in (2.15). One can 
therefore look upon the analysis as providing a deriva- 
tion of the length of the projection interval, in contrast 
with the usual graphical or intuitive arguments. 

Results for three-step compensation 

An entirely analogous solution can be obtained using 
the three-step projection of (2.13). The necessary 
starting values 5_i and 5_ 2 are taken the same as 6 0 - 


The determinant of the coefficients is now T fifth degree, 
with five roots. Two are of the form (2.23) with d the 
same, (2.24). The next coefficients are 

Kd(l + dear) (a* + 2a 2 + 3/2) 

e- V 2 d 2 (2.34) 

2(d -f- coyf r/f) 

and, if e — d 2 /2 = 0, 
f - d s /6 = 

— K(1 + d<or)[Kdur/12 + d 2 (a 2 - 22/12)] 

(2.35) 

2(d + w^fr/r) 

The other three roots are power series in (o/T) 1/3 , given 
in terms of 

r = (- 1 + i3 1/2 )/2, r = (- 1 - i3 1/2 )/2 

by 

z 3 ,4'5 = (Ka>Ta*yT)i/*(l, , r) (2.36) 

+ (Kcora 2 coT) 2/3 (ai + a 2 )(l, r, r)/3a 2 
+ Kcora 2 coT/3a 2 + • • • 

The residues at z = 1 and z = zi are the same as for 
Taylor series compensation, (2.29) and (2.30). The first 
terms of the residues of the other three roots are 

Residue (z 3>4 , 6 ) - (Kx 0 /6)(«T)<^>' 8 

(Kcora 2 ) (n+1)/3 (1, r, r ) n+1 (2.37) 

To make the 0(o>T) errors vanish we make 

e — d 2 /2 = 0 by taking 

ai ~h 2a 2 = — 3/2 

which agrees with (2.14). To determine ai, a 2 separately 
one can go to (2.35) and choose a 2 = 22/12, which is 
1/24 less than the value in (2.14) . So again we come very 
close to the usual projection distance by an analytical 
derivation. 

The error caused by the extraneous roots should not 
be as small for this type of compensation, since it 
depends on (coT) n/3 , and decreases rather slowly, as n 
increases. 

The solution is also less stable, because of the 
one-third power dependence of the roots on o/T. In fact, 
the absolute values through the first two terms are 
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|z 3(4>6 | “ (Ka>ra 2 o/T) 1/3 

|l + (1, — 3 ^)(Kcora 2 wT) 1/3 (ai + a2)/3a2| 

(2.38) 

and since ai + a 2 is negative, the conjugate pair z 4 , z 6 is 
the least stable. This is the pair introduced by going 
back two steps in time, which shows the destabilizing 
influence of that procedure. 

Stability considerations 

As mentioned already, it is the extraneous roots 
which control the stability of the hybrid calculation. 
For the Taylor series compensation, this root is given 
by (2.28), and is of the order KcorwT, the same as for 
the uncompensated case, which can be obtained from 
(2.28) by putting k = h — l = 0. In fact, the com- 
pensated root is somewhat smaller (thus more stable) 
since the coefficient of the first term is —1/2 instead 
of 1. Notice that one could improve the stability, 
at some cost in accuracy, by choosing k so that 
the coefficient of the second term in z 3 vanishes, al- 
though k would then depend on the parameters of the 
problem instead of being constant. 

In contrast, the extraneous roots for three-step 
compensation are given in (2.38) and are of order 
(a 2 Kcor<oT) 1/3 , considerably larger than the uncom- 
pensated or Taylor series cases. Therefore, the three- 
step method yields a less stable solution. If a 2 = 0, 
we then have a two-step scheme, and there are only 
two extraneous roots, of order (Ko)rwT) 1/2 , more 
stable then the three-step scheme but still less stable 
than the uncompensated or Ta}dor series cases. 

If the scheme of Howe and Fogarty, discussed in the 
Introduction, were used, there would also be two ex- 
traneous roots of order (KconoT) 1 / 2 , so the stability 
would be about the same as for a two-step scheme. In- 
fact, the two-step and Howe-Fogarty schemes are 
closely related, both going back one step in time. 

Computer implementation 

The Taylor series (or derivative) method of com- 
pensation was tested, and compared with the three- 
step method, by solving the problem posed by (2.1), 
(2.2) on the hybrid computer of the NASA Electronics 
Research Center. This is a Beckman 2200/SDS 9300 
machine with interface built by Beckman. 

As described above, integration the of x and x, and 
the combination of x and x on the left side of (2.1) were 
performed in the analog computer. The value of 8 was 
found in the digital computer, by sampling x and x from 


the analog at intervals of T and extrapolating. Then 
8 P was calculated and sent back to the analog to be 
used to find x. The A to D sampling was accomplished 
first, followed immediately by the D to A updating. 
In order to compare the resulting hybrid solution with 
a continuous solution, the complete equation was also 
solved in the analog simultaneously as an oscillator 
with frequency and damping as defined by (2.3). 
The details of the analog circuit, the digital programs, 
the control circuit, the scaling, etc., are given in Ref. 
7, pages 130-141 and Appendix E. 

The output of this calculation was a set of strip- 
charts and digital printouts giving the hybrid and pure 
analog values of x, x, x, 5, and the difference between 
the hybrid and analog values, which may be taken as a 
measure of the error of the hybrid solution. 

Runs were made for the parameters 

co = 0.412, f = -0.2425, = 0.7 

using the conventional compensation constants 
ai = —21/4, a 2 = 15/8, a 0 = 1 — ai — a 2 = 35/8 
for the three-step method, and the set 
t = h = 3/2, k = 13/12 

Which we have derived for the Taylor series method. 
The values of o> T were varied between 0.5 and 15.0. For 
each such value, the control parameters K and r can 
be calculated from (2.3). Runs were made at several 
sample intervals T in order to study the stability of 
the hybrid calculation. For large enough T is was al- 
ways possible to make it unstable. 

The relative merits of the Taylor series and three- 
step compensation schemes, compared to pure analog 
and uncompensated hybrid results, are strikingly 
illustrated by excerpts from the strip charts drawn 
by the analog computer. The case chosen for illustra- 
tion is a) T = 15, for which (2.3 gives K = 1234 r = 
0.0942. 

Figure 1 shows the strip chart record for x(t) for 
four cases. At the top is the continuous solution pro- 
duced by a pure analog calculation. Below follow the 
records for the uncompensated, Taylor series com- 
pensated, and three-step compensated hybrid solutions 
all for a sample interval T = 25 milliseconds, which is 
17 samples per cycle based on total frequency. In order 
to bring out the errors more clearly, Figure 2 shows the 
difference signal x u — x^ on a larger scale, where the 
subscripts H and A stand for hybrid and analog, re- 
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f = —0.2425, = 0.7, cor - 15. The sample 
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spectively. The great improvement in accuracy achieved 
by going from no compensation to Taylor series to 
compensation is apparent. On the other hand, the 
solution with three-step compensation is unstable and 
saturates the amplifiers. 

The stability properties of these three cases are 
predicted by the formulas we have developed. For no 
compensation {l = h = k = 0), (2.28) gives [z 3 | = 
0.736, while for Taylor series compensat : on the same 
formula shows |z 8 | = 0.411. On the other hand, for 
three-step compensation, (2.38) gives |z 8 | == 0.578, 
|z 4 , z 6 | = 1.375. Therefore, the part of the solution 
corresponding to the root z 8 is stable, but the part 
corresponding to the roots z 4 , z 6 are unstable, leading 
to an un table solution, as shown in Figures 1 and 2. 
To stabilize the three-step case, the sample interval 
T would have to be reduced to 10 ms, or about 42 
samples per cycle, for which (2.38) shows |z 4 | == 0.928. 
A case run at this value of T indeed showed three-step 
compensation to yield a stab’e solution. 

To destabilize the uncompensated and Taylor series 
cases, a run was made at T = 50 ms (8.4 samples per 
cycle), for which (2.28) gives |z 8 | = 1.89 and 1.12, 
respectively. The results of the run are shown in Figure 
3, where the rapid increase of x until the amplifiers 
saturate is seen for both cases. 

Similar results hold for other values of co T . In all cases, 
stability or instability exhibited by the numerical 
calculations could be predicted in advance by use of 
(2.28) or (2.38). Furthermore, the digital printouts 
showed that with the same set of parameters and 
sampling interval, the Taylor series method gave more 
accurate results, that is, results closer to the analog 
(continuous) solution. The improvement in accuracy 
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Figure 3 — Strip chart records of x(t) for co =» 0.412 
= — 0.2425, fr = 0.7, cor =■ 15. The sample 
interval T * 50 ms. 




Extension and Analysis of Use of Derivatives 


could be quite marked for sample intervals near the 
stability limit of the three-step method. This is in 
accord with the deductions from the extraneous solu- 
tions (2.31) and (2.37). 

CONCLUSIONS 

The Taylor series (or derivative) method of compensa- 
tion appears to have a number of advantages over the 
three-step method of compensation for the time delay 
and D to A hold errors of hybrid computing. For a 
given case, it can be made stable for larger sampling 
intervals than the three-step method, and is more ac- 
curate at the same sampling interval The Taylor 
series method can also be made stable for larger sam- 
pling intervals than the uncompensated case for almost 
all values of the parameters, while the three-step 
method may well be unstable when the uncompensated 
calculation is stable. In other words, compensating by 
Taylor series can improve the stability, while com- 
pensating by the three-step method destabilizes. 

These stability advantages of the Taylor series 
method depend to a large extent on the particular form 
of implementation used. The crucial point is not to use 
information which goes back in time, since each such 
backward time step introduces an extraneous solution 
which is de-stabilizing. The implementation suggested 
here, where the extrapolations are accomplished in the 
digital computer, avoids extra backward time steps 
while still permitting the A to D sampling to be done 
before the D to A transfer. If the extrapolations are 
done in the analog computer, as in the Howe-Fogarty 6 * 6 
implementation, the A to D before D to A sequence of 
operations introduces one backward time step and 
adversely effects the stability. If the sequence is per- 
formed in the order D to A followed by A to D, the 
analog extrapolation of Howe and Fogarty would give 
exactly the results of the present analysis. 

The z-transform method of analysis for linear equa- 
tions can be carried through with arbitrary coefficients 
in the extrapolation formulas. Then they can be chosen 
to yield the desired improvement in accuracy and/or 
stability. The coefficients of the first power of the sample 
interval T clearly should be chosen to extrapolate by 
1 .5T, but the coefficient of T 2 in the extrapolation for- 
mula for x has some flexibility in the choice, depending 
on whether accuracy or stability is the paramount 
consideration. 


When the derivatives are available, there is no more 
difficulty implement the Taylor series method than 
the three-step method, and there are no starting prob- 
lems with the former, as there are with the latter. 

On the basis of the analysis and numerical results 
of this study, the Taylor series method of compensa- 
tion seems preferable in all ways to the three-step 
method, and can be recommended whenever the de- 
rivatives are available. Whether this conclusion also 
will hold for non-linear equations and for higher order 
systems, depends on the results of applying the Taylor 
series method to those cases. Some preliminary study 
of a linear fourth order system by the present method 
of analysis indicates that the Taylor series method may 
be applicable, but with d fferent values of the compen- 
sation coefficients. 
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HYP AC — A hybrid-computer circuit 
simulation program 

by PHILIP BALABAN 

Bell Telephone Laboratories 
Holmdel, New Jersey 


INTRODUCTION 

Computer simulation of electronic circuits and systems 
has become an increasingly important tool in circuit 
and system design. Such simulations enable one to: 

1. Eliminate the necessity of building many bread- 
board models in order to evaluate different 
design approaches. 

2. Analyze the performance of the circuit as a 
function of different parameters. 

3. Model semiconductor devices and integrated 
circuits so that intrinsic parameters become 
accessible. 

4. Perform optimization and tolerance analysis 
of a circuit which requires many evaluations 
of the circuit with different sets of parameters. 

The frequency domain analysis of a linear circuit 
is usually simulated on the digital computer and the 
solutions obtained are accurate and fast. The analysis 
in the time domain is more difficult with both analog 
and digital computers often being used for this pur- 
pose. The analog computer simulation technique 
features fast solution times and designer interaction; 
the digital computer simulation programs have the 
advantage of a large dynamic range and very simple 
programming. 

Unfortunately, both types of simulations can handle 
only relatively small circuits, since the analog computer 
is hardware limited (a six to eight transistor circuit 
can be patched on a large analog computer) and the 
digital computer requires an excessive amount of 
computation time, especially when the eigenvalues of 
the system are far apart. 


The HYP AC hybrid computer program was de- 
veloped in order to overcome the above-mentioned 
shortcomings for a special class of problems. This 
class includes systems that have a modular structure 
where a few types of a particular subcircuit (amplifier, 
gate, etc.) are used repetitively. Such systems are very 
common, especially since the advent of integrated 
circuits. 

The program structure 

The program takes advantage of the speed of the 
analog computer and the possibility of storing in- 
formation on the digital computer. The principle 
is the following: 

A whole subcircuit (such as an amplifier or gate) 
is patched on the analog computer and multiplexed 
by the digital computer to form a large system. Thus, 
the digital computer regards this one subcircuit as 
N distinct subcircuits. Each subcircuit has its own 
distinct inputs and parameters which the digital 
computer provides sequentially to the analog computer. 
In addition to the interconnection and memory 
capability, it is very couvienent for the multiplexer 
to have the capacity to model different circuit ele- 
ments. Therefore, a general purposej widely used 
block-oriented digital program called “PACTOLUS” 1 
was chosen as the vehicle for this hybrid program. 
Hence, the name HFbrid PACTOLUS — HYPAC. 

Description of P AC TOLU S 

PACTOLUS can be described as a block-oriented 
interpretive language. The program incorporates all 
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standard analog computer elements (integrators, sum- 
mers, multipliers, etc.). In addition, the program allows 
a few special elements of unspecified function. The 
user may write his own subroutine for any function 
he desires. 

We shall consider the computing operation pro- 
cedure implied by Figure 1 where 

X n = the input vector at t = nAT 

Y* = the output vector at t = nAT 

t 

Y n = the derivative vector at t = nAT 

At time t = 0, all the input vectors [Xo] and the state 
vector [Y 0 ] (the initial conditions of all integrators) are 
given. From these given conditions, the derivative 
vector [Y n ] = FQX„, YJ) can be computed. [Y n+ i] is 
computed from [Y n ] using any integration method. In 
the original PACTOLUS, a second order Runge-Kutta 
method is used. 

Modification of PACTOLUS for HYP AC 

The original PACTOLUS was supplemented and 
modified in many ways to convert it into a general 
purpose hybrid program. Some of the added features 
are briefly illustrated below. 

The hybrid element 

The hybrid element was conceived to be a special 
element of the PACTOLUS repertoire. The analog 
subcircuit of the hybrid element is patched on the 
analog computer as in Figure 2. 

The inputs and the initial conditions are supplied 
through D/A converters by the digital computer. 
The outputs of the circuit and the outputs of all the 
integrators are fed back to the digital computer through 
A/D converters 1 

In HYPAC, the hybrid block is sorted as an inte- 
grator, which implies that at time t = 0, the outputs 
of the circuit and the initial conditions of the inte- 



DiGITAL COMPUTER 



ANALOG COMPUTER 


Figure 2 — The hybrid element 

grators are known. The inputs to the circuit are then 
evaluated from the given values and applied through 
the D/A converters to the inputs of the analog block. 
The stored initial conditions (IC) for each integrator 
are applied to the appropriate integrators. After the 
integrators have settled to their respective initial 
conditions, the analog computer is switched into the 
operate mode for a time AT.* At the end of this period, 
the outputs of this circuit are sampled and transferred 
to the digital computer. The outputs of the integrators 
are also sampled and stored in the digital computer 
to be used as initial conditions in the next time period . 
If N such circuits are used in the system, the HYPAC 
program regards them as N distinct subcircuits. The 
inputs and outputs are regarded as coming from dis- 
tinct blocks and used accordingly. 

The subcircuits must have an identical topology, 
the parameters, however, can be different and can be 
changed before each run using digitally controlled 
attenuators. 

The output configuration 

The original PACTOLUS allows only one output 
per element. Since the hybrid element is actually a 
complex circuit, it will usually have more than one 
output. Therefore, another special element was 
assigned to handle multiple outputs. For each, addi- 
tional output, one such fictitious output element “<£” 
is assigned. In Figure 3, the second and third outputs 
of the hybrid element No. 1 are read out from output 
elements 1 and 3. 

The output elements simply store the value of the 
output A/D converters. The D/A and A/D converters 


* In order to operate reasonably efficiently, and accurately, the 
analog computer must have electronic mode control. 
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Figure 3 — The output configuration of the hybrid 
element 


usually come in pairs, therefore, this program is coded 
so that to each input corresponds an output, i.e., 
to input No. 1 corresponds output No. 1; to input 
No. 2, output No. 2. This associates each output with 
a corresponding D/A converter. 

The integration algorithm 

The integration algorithm was changed to the 
simple Euler method since this is closest to the way 
an analog integrator performs in the hybrid element. 
However, a closed loop integration method and a 
variable integration step method are now being studied* 

Detailed operation of the hybrid program 

As noted before, the hybrid element is sorted as an 
integrator. All inputs [Y n ] to the hybrid elements and 
inputs [I n ] to the integrators at time t = nAT (shown in 
vector form in Figure 4) are computed from the output 
vectors [Z n ] and [I n ] of these elements and the input 
vector [X n ] to the circuit at time t = nAT. The compu- 
tation of [In+i] is described in an earlier section. 

The computation of [Z n +i] from [Y w ] and the cor- 
responding initial conditions can be understood by 
examining the sequence of operations shown in the 
timing diagram in Figure 5. 

All the commands and timing signals of the analog 
computer are generated in the digital computer. First, 
the analog computer is switched into the “IC” (initial 
condition) mode (1) in Figure 5. Then the parameters 
of the hybrid element are adjusted through the digitally 
controlled attenuators (DC As) (2). The inputs and 
the corresponding initial condition voltages are then 
applied to the hybrid element (3). The hybrid element 
is left in this mode for a constant time necessary for 
the integrators to settle. The analog computer is then 
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Figure 4 — Principle of operation of HYPAC 

p IC MODE ►j^OPERATC-^-HOLO-* 

TTT ! 1 r 

12 3 4 3 1 

SEQUENCE: 

1. ANALOG COMPUTER SWITCHED INTO "XC" MODE. 

2. DIGITALLY CONTROLLED PARAMETERS SET. 

3. INPUTS AND INITIAL CONDITIONS APPLIED. 

4. ANALOG COMPUTER SWITCHED INTO "OPERATE" MODE. 

5. ANALOG COMPUTER SWITCHED INTO "HOLD" MODE 

AND Z n IS READ INTO DIGITAL COMPUTER. 

Figure 5 — Sequence of operation of the hybrid element 

switched into the “OPERATE” mode for the time AT 
which is specified by the user at the beginning of each 
run. At the end of this period, the analog computer 
is switched into the “HOLD” mode for a very short 
time (lOOusec) long enough for the A/D converters 
to be read out. The outputs of the A/D converters 
are then the outputs of this hybrid element at time 
t = (n + 1) A'T. The analog computer is ready to be 
used as the next hybrid element. 

Selection of the integration interval AT 

The overall circuit has to be scaled in the time do- 
main to analog computer compatible frequencies, 
usually smaller than 1 kHz. The smallest possible 
AT which is provided in the program is one msec. 
In order for the solution to converge, the eigenvalues 
of the circuit should not exceed the value of 2/AT = 
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2X10 3 .** (This restriction applies only for hybrid and 
digital loops in the circuit and not to loops confined to 
the analog computer.) Therefore, selection of the time 
scale defines the upper bound of the integration 
interval. On the other hand, AT should be large enough 
in order to minimize the effect of i noise and truncation 
by the A/D converter. It is, therefore, very important 
to scale the analog computer so that the integrators 
work at the highest possible level of the voltage range. 

Selection of the initial condition setfling time 

The initial condition settling time is dependent on 
the value of the integrating capacitor and the maximum 
current of the output stage of the operational amplifier 
and is, therefore, dependent on the computer used. 
However, in any analog computer, the IC settling 
time is directly proportional to the integrator capaci- 
tance. Therefore, the overall computing time is sub- 
stantially reduced by choosing a faster integrator mode. 
On the other hand, the impairments introduced into 
the integrators by switching transients and integrator 
drifts are inversely proportional to the value of the 
integrator capacity. 

The experimental example described in a later section 
was run both with IjiF capacitors with 20 msec IC 
settling time and .01/uF capacity with 1 msec IC 
settling time. No noticeable deterioration of accuracy 
was detected in this example, but this may not be 
generally true and if solution tijme is not critical, a 
larger integrator capacitor should l>e used. 

Accuratcy considerations 

HYPAC is both a digital and analog system, 
therefore, all factors that produce errors in digital 
and analog differential analyzers will also produce 
error in this system. These factors are many and are 
extensively covered in literature. 1 2 ' 3 * 4 They include 
finite sampling, round off and quantization in the 
digital system and limited bandwidth, noise (limited 
dynamic range), accuracy and linearity of components, 
etc., in the analog system. 

The accuracy consideration which is unique to this 
program is connected to the way the initial conditions 
are set up in the hybrid element. 

1. The initial condition of any particular run has 

come through an Analog to Digital to Analog 

conversion string and was therefore truncated 

by the A/D converter. 


** If the circuit is nonlinear, the eigenvalues should be evaluated 
at the worst possible combination of parameters and biases. 


2. Switching transients are generated whenever 
the integrators are switched from one operation 
mode to another. These transients are caused 
by charges stored in the parasitic capacities 
of the switches, and affect the outputs of the 
integrators. The magnitude and polarity of 
these small voltage increments caused by switch- 
ing transients can be regarded as random. 6 

The error contributions of these two factors are 
minute for each integration period AT (a few milli- 
volts) but since a solution usually consists of a few 
hundred integration periods, the propagation of these 
errors can be very significant. It is the author’s feeling 
that the errors will not build up if the system itself 
and the hybrid element in particular is stable. Although 
our experiments Jiave confirmed this, the above state- 
ment is rather intuitive and needs further investiga- 
tion. 

Programming of HYPAC 

In order to demonstrate how a problem is prepared 
for simulation, let us consider a simple example. 

The circuit in Figure 6a is a set-reset flip-flop com- 
posed out of two NOR gates, a positive voltage Vi 
resets the flip-flop and V 2 is the set input. The circuit 
diagram of the NOR gate used is shown in Figure 6b. 

The complete circuit is depicted in Figure 7. Let 
us assume that it was decided to use the outlined 
subcircuits as the hybrid element in this example. 
Since this element is not buffered from its input and 
output circuits, some special approach is needed in 
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Figure 6a — Set-reset flip-flop 
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Using this method, the circuit can be partitioned 
into blocks as shown in Figure 8. 

This circuit is identical to the one in Figure 7 if 
the relations 



b 


Figure 6b — NOR gate 



Figure 7 — Set-reset flip-flop 


order to extract this subcircuit from the whole circuit. 
The approach we used is called the partition method 
and is described in the Appendix. 


hi — ijsij V.«i — — Vjji; Ioi — ioi ) Vox — ~Voi 

ij52 — ifi2) Vb2 — “ Vb2) Io2 = io2j V 02 — ~~ Vo2 

are satisfied. 

The programming of HYPAC is reduced to writing 
the nodal or loop equations of the circuit. The HYPAC 
block diagram is shown in Figure 9. The blocks HI and 
H2 are hybrid elements, blocks <£ 1 and $2 are associated 
with the second outputs of the hybrid element, all 
other blocks are conventional PACTOLUS elements. 

The hybrid element can be patched on the analog 
computer as shown in Figure 10. The transistor model 
used in this simulation is the analog separation model 6 
based on the charge control equations. The input I B 
and I 0 and the initial conditions are applied through 
D/A converters. The outputs of the subcircuit and the 
outputs of each integrator are read out through A/D 
converters. The outputs of the integrators are stored 
to be used as the next initial conditions. 

While using this method of programming, one 
should be careful not to introduce hybrid algebraic 
loops, which, of course, can be highly unstable. Such 
loops can be easily spotted by inspection and can usual- 
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Figure 9 — HYPAC block diagram for set-reset flip-flop 



Figure 10 — Analog simulation of a NOR gate 

ly be eliminated by placing them wholly in the hybrid 
element. 

Experimental results 

In order to demonstrate the effectiveness of the 
program, a more extensive example was programmed.* 
The example consisted of a trigger flip-flop composed 
of four half-nanosecond NOR gates. (Figure 11). 

The flip-flop changes state every time a negative 
(zero) pulse is fed to the input. Since the NOR gates 
are identical, an entire gate was programmed as a 
hybrid element. The schematic of the half nanosecond 
NOR gate is depicted in Figure 12. The outlined part 
was simulated on the analog computer as shown in 
Figure 13. 


* The Hybrid Computer, used in this experiment, consisted of 
an EAI-8800 Analog Computer linked with a CDC-3300 digital 
Computer 



Figure 11 — Trigger flip-flop 



Figure 12 — NOR gate 



Figure 13 — Analog computer simulation of a NOR gate 

The transistors and diodes are simulated using the 
separation technique. Only the relevant reactances 
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for the operation of this gate in the flip-flop circuit 
were simulated. The HYPAC block diagram of the 
flip-flop is shown in Figure 14. The R and C elements 
at the inputs of the gate were included in the digital 
simulation of HYPAC in order to reduce the number 
of D/A and A/D converters necessary for simulation. 

Since the risetime of the circuit is approximately 
t r = 0.5 nsec, we assumed that the largest eigenvalues 


of the circuit must be around w max 


= t r = 2x 10. 10 To 

O.a 


be on the safe side, we chose a time scale a T = 10 8 so 
that 1 msec machine time = 0.01 nsec real time. The 
voltage scale was a v = 10 and the current scale 
<*/ = 1,000. 

A typical solution is shown in Figure 15. A is the 
waveform applied to the input of the circuit (HYPAC 
element No. 2 in Figure 14), B is the output waveform 
of the NOR gate No. 1 in Figure 11, or HYPAC 
element No. 14 in Figure 14. C is the waveform at the 
output of NOR gate No. 3 in Figure 11 or HYPAC 
element No. 24 in Figure 14. 

The integration interval for this experiment was 
AT = 4 msec corresponding to .04 nsec in real time. 
The waveforms in Figure 15 represent 16 nsec of the 
solution time. These waveforms correspond quite 
closely to ones observed in breadboard experiments. 




Figure 15 — Waveforms of simulated flip-flop 


The total computing time was 70 sec for the program 
when the hybrid element used 20 msec IC settling 
timep er run, corresponding to 1/xF integrator capacitor 
value. The computing time was reduced to 40 sec 
when the IC settling time was 1 msec corresponding 
to 0.01 n F integrator capacitor value. 

A simplified problem using two half -nanosecond 
NOR gates was run on the digital computer. The 
program used was a general purpose circuit analysis 
program using the state space approach. The resultant 
waveforms had the same general shape as the one 
obtained on HYPAC although exact evaluation of 
the errors was not possible since the circuits were not 
identical. The computing time for a 16 nsec solution 
on the digital computer was one hour. 


Improvements considered 

One of the main difficulties in using this program 
is the selection of the appropriate integration interval 
AT. In order to simplify this task two integration 
algorithms will be incorporated into the HYPAC 
program. The user will have the choice of selection 
of the appropriate algorithm for his problem. 

Integration algorithm for stiff differential 
equations 


As noted earlier the Euler integration method is 
stable only if AT < 2/a* where a * is the largest eigen- 
value of the system. For systems where the frequencies 
of interest are a few orders of magnitude smaller than 
the largest eigenvalue the use of this integration method 
will result in extremely long computational runs. 
As an example of such a system we can consider a 
1 kHz oscillator using “100 MHz” transistors. 

A backward integration method where the stability 
is independent of the integration interval was recently 
proposed by Sandberg and Shichman 7 This algorithm 
uses a Newton-Raphson iteration technique to solve 
the implicit algebraic equation resultant from the 
backward integration method. Preliminary invest- 
gations suggest that a simiilar integration method can 
be adopted for HYPAC . 

Let Z n denote the output vector of all the Hybrid 
elements, digital integrators and other time dependent 
elements at time t = n AT 


then 

where: 


Zn = f(Yn, X*, AT) 


( 1 ) 


Yn is the input vector to the hybrid elements 
and digital integrators. 


Xn is the initial condition vector of all the 
integrators. 
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The input vector in turn is computed from the algebraic 
equation 

Yn = g(Z *, Un) (2) 

where Un is the input vector to the circuit. 

Combining equations (1) and (2) we get 


an iterative routine so that all integration errors will 
be smaller than Emax and at least one larger than Emin. 

Such an adaptive integration method will keep the 
errors within stable bounds and will reduce the errors 
caused by truncation of the A/D converters. 

CONCLUSION 


Z n == f(g(Zn, Un),Xn, AT) (3) 

This is an implicit equation which can be solved for Z n 
using the Newton-Raphson method. For the k th 
iteration we get 

F = Zn,*- f(g(Zn,*,TL),Xn, AT), k = 0 — ► K (4) 
and 

Zn,0 = f(g(Zn-l,; Un), Xn, AT) (5) 

The equation is considered solved when ||F*|| < e 
where ||F*|| is the usual Euclidian norm and € a preset 
error criterion. 

The Jacobian will be computed by perturbing each 
output Z n ,i by AZ nji and computing all the partial 
derivatives dF n j/dZ nji . 

An additional benefit of this method is that hybrid, 
algebraic loops converge to a stable solution. 

Integration algorithm with adaptive 
integration interval 

In order to increase the accuracy of the Euler in- 
tergration method, the error produced at the output 
at each intergrator for every intergration step will be 
monitored. The integration interval is then adjusted 
so that this error remains within prescribed bounds. 

The Euler intergration method with a variable in- 
tegration interval is given by 


Y n+ i = Y„ + AT n Y n 


The error E is approximated by the second difference 
of Y n +i (second term of the Taylor expansion) . 




- Y -(' + ^r.) 


ATn Y 

+ 4TT. Y - 


E n +1 — 


Two error levels will be predetermined Emin and Emax . 
Each integration interval will be adjusted through 


A hybrid program has been developed which makes 
it possible to analyze a special class of large circuits 
considered untractable by conventional methods. 
The topology of the circuit has to be modular and 
composed of identical subscripts. 

Effective use of the program requires a degree of 
sophistication since the programmer has to be familiar 
with all the intricacies of both analog and digital 
simulation such as scaling and selection of the inte- 
gration interval. Although the setup of the problem 
is relatively time consuming, the reduction of compu- 
tation time compared to a wholly digital computa- 
tion solution is dramatic. This reduction of time makes 
it possible to perform optimization and tolerance 
analysis. 

The program has its most significant value in design 
of integrated circuits where modular topology is the 
standard design philosophy. 
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APPENDIX 

Partition of circuit for simulation purposes 

Simulation of circuits on the analog computer 
generally does not preserve the topology of the cir- 
cuit. Since every node or branch is described by two 
variables, voltage and current, these variables are 
handled separately on the computer and do not appear 
at the same place (Figure 16) . 

Therefore, if a circuit has to be partitioned into 
two (or more) parts, both variables, the voltage and 
current, have to be matched. The simplest way to do 
it is to replace each cut branch by dependent voltage 
and current sources as shown in Figure 17. 

The circuits 17a and 17b are equivalent when i b -= i a and 
v 6 = — y 0 . i a is evaluated from Circuit I and used as 
the current source in Circuit II. v b is evaluated from 
Circuit II and used as the negative voltage source in 
Circuit I. 
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The simulation of Circuit I and Circuit II and the 
interconnection between them is shown in Figure 18c. 

The choice of using voltage or current sources as 
terminations of partitioned circuits is sometimes dic- 
tated by the topology of the circuits, e.g., a capacitor 
instead of the inductor in Circuit I will force one to 
choose a current source as a termination otherwise 
one state variable will be eliminated. 


Figure 16a — A circuit, 



b. 

Figure 16b — Simulation of a circuit 
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Figure 17a — Circuit 
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Figure 17b — Partitioned circuit 


As an example, let us consider the circuit in Figure 
18. The circuit has to be partitioned along the dotted 
line and simulated as two separate interconnected 
circuits. The interconnection is replaced by a voltage 
source v a and a eurreht source i b as in Figure 18b. In 
order for the circuits to be equivalent, v b = — v 0 and 

i a = i*. 


Figure 18c — Simulated circuit 


REFERENCES 

1 R D BRENNAN H SANO 
PACTOLUS — A digital analog simulator pronram 
Pibc FJCC Vol 26 1964 

2 G; A KORN T M KORN 
Electronic analog and hybrid computers 
M;cGraw-Hill 1964 Chapt 3 

3 P HERICI 

Error propagation Jor difference methods 
John Wiley 1963 

4 G A BEKEY W J KARPLUS 
H ybrid computation 

John Wiley 1968 

5 L A O’ NEILL 

Adaptive detection and representation 













780 Fall Joint Computer Conference, 1969 


Johns Hopkins Univ 1966 PhD Thesis 
6 P BALABAN J LOGAN 
Analog computer simulation oj semiconductor circuits 
Proc SJCC Vol 32 1968 


7 I W SANDBERG H SHICHMAN 
Numerical integration oj systems of stiff nonlinear differential 
equations 

BSTJ Vol 47 No 4 April 1968 



A time shared I /O processor for real- 
time hybrid computation 
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Bolt Beranek and Newman Inc. 
Cambridge, Massachusetts 


INTRODUCTION 

There are economic advantages to time-sharing a fa- 
cility with hybrid resources. It is quite unlikely that 
any single hybrid problem will be able to utilize all of 
the system resources 100 percent of the time. This is 
the same kind of reasoning that leads one to consider 
time-sharing for conventional digital problems. How- 
ever, time synchronous real-time hybrid time-sharing 
and non-synchronous non-real-time digital time- 
sharing are quite different problems, with the former 
posing some considerable difficulty to sequential digital 
machines. 

A sequential machine can perform only one operation 
at any single instant of time. Time -sharing is thus 
accomplished by dividing the available time amongst 
several tasks, and by task switching from one task to 
another at judicious times. Most sequential machines 
cannot perform this switch rapidly. Conventional 
solutions to real-time problems utilize the central 
processor of the system for both hybrid computation 
and I/O; the central processor is usually a sequential 
machine. These solutions require rapid and frequent 
switches of the attention of the CPU and, as a conse- 
quence, are expensive of machine time and are in- 
accurate in their timing. 

To avoid these disadvantages we have separated the 
hybrid I/O from the hybrid computation. The hybrid 
I/O is handled by a special processor called the Hybrid 
I/O Processor 1 while the hybrid computation is per- 
formed by the central processor. This separation gives 


us a system which can service several simultaneoy 
real-time hybrid problems efficiently and accuratelt 
as well as several non-real-time problems concurrent 
with the hybrid problems. 

Hybrid problems 

When we speak of real-time demands, we generally 
visualize applications where the computer system is 
required to interact with an experiment which is being 
conducted in the real world in real-time. While these 
experiments are certainly examples of real-time de- 
mands, we extend the definition to include any external 
demands of the computer system where the reaction 
time to a demand must be relatively small or where 
the timing of events must be precise. 

Data generation and acquisition 

The simplest demand which fits the real-time cate- 
gory is the data acquisition or data generation problem. 
Here the computer system is called upon to sample or 
generate repetitively an analog signal. Data rates as 
high as 20 KHz are typical for speech sampling or 
generation. Much lower rates are typical in manual 
control experiments. Sampling or generation is generally 
performed periodically at a fixed rate; thus, most of 
these problems may be considered synchronous. How- 
ever, in some cases the initiation of the sampling is 
not synchronized to the computer system. For example, 
the sampling of a recording on an analog magnetic tape 



782 Fall Joint Computer Conference, 1969 


may have to start when a control signal is read off 
the tape. The computer system generally has no control 
over this start time, and this sampling problem is now 
asynchronous to the computer system because its initi- 
ation cannot be predicted or prescheduled. 

Hybrid simulations 

Hybrid simulation problems are also in the real- 
time category. Generally an analog computer will per- 
form the linear portions of the simulation while the 
digital computer will perform the non-linear portions. 
The digital portion of the problem is often charac- 
terized by the following synchronous behavior: 

1. Sampling a set of input values from the analog 
computer at a specific time. 

2. Performing some digital computation utilizing 
these input values. 

3. Setting up some output values for the analog 
computer at a specific time. 

Interactive data generation and/or acquisition 

Display generation problems are often in the real- 
time category. This is especially true when displays 
are used to simulate real world activity such as the 
view of a runway while landing an aircraft or the 
view of the instruments of a control panel. Display 
generation problems are much like any waveform gener- 
ation problem, but the data rate is generally high, and 
display problems often tend to present asynchronous 
demands such as the generation of a new display in 
response to the changes in a manual control by a 
human operator. 

These three categories of real-time problems classify 
most common real-time demands. 

Conventional hybird systhms 

Separation of the computation and the I /O 
United Aircraft Facility 

Time-shared hybrid systems have existed since 1963 
when Belluardo et al at United Aircraft Corporation 
developed a hybrid computation facility 2 based on the 
DEC PDP-6 computer system. This facility handles 
several hybrid problems simultaneously, but the prob- 
lems must have very similar demands. They must be 
synchronous problems with the same base repetition 
frequency. The maximum base repetition rate is about 
6Hz which is quite slow. 

Hybrid I/O is performed by the PDP-6 arithmetic 
processor in response to I/O commands issued directly 


by the user programs. These I/O commands Eire pro- 
tected in that they cannot affect other users’ hybrid 
devices. 

Each of the problems being serviced by the system is 
assigned a particular start time, offset from the base 
period start time by a fixed number of real- time clock 
ticks, which are 1/60 seconds apart. It is assigned as 
many consecutive “clock tick” slots as needed to luu. 
An example of this type of scheduling is shown in 
Figure 1. 

MIT-ESL-hybrid facility 

Connelly developed a hybrid facility 3 > A at the MIT 
Electronic Systems Laboratory on the DEC PDP-1 
computer. This facility also time-shares several, hybrid 
problems which are synchronous in their timing de- 
mands. However, the repetition frequencies of these 
problems are considerably more flexible. The scheduling 
procedure utilizes a time-slot algorithm with the time- 
slot duration about 10 times larger than the computer’s 
switching time or about five milliseconds. 

The repetition frequencies of the problems must all 
be members of a “synchronizable set” which means 
each element of the set is a common multiple of all 
smaller elements. Such a set of periodic processes is 
easily scheduled by dividing time up into the highest 
common denominator of all the process periods. Like 
the United Aircraft scheduler, a process gets its first 
time slot at exactly synchronized times, but it may 
get several temporally disjoint slots before it completes 
its computation for each basic repetition. An example 
of time slot scheduling is shown in Figure 2. 

The hybrid I/O is performed by the PDP-1 central 
processor, again in response to protected, direct I/O 
commands from the user. It is important to realize 
that with this type of scheduling, the I/O commands 
must be issued at the start of each repetition period 
since this is the only time-slot for which the user can 
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Figure 2 — Example of MIT-ESL time-slot scheduling 

be guaranteed a specific start time (at the beginning 
of his repetition period). The relative placement of the 
other time slots is dependent upon the other problems 
being serviced by the system. 

The conventional hybrid systems to date are repre- 
sented by the examples given above. These systems 
will not handle either asynchronous or high rate real- 
time problems. 

Why conventional systems cannot handle 
high rate problems 

A general purpose digital computer when performing 
real-time I/O consumes a great deal of its capacity 
since it was not designed for such a task. A general 
purpose processor is not well-suited to performing real- 
time I/O because it cannot switch its attention between 
tasks rapidly enough. Switching times on the order of 
a half-millisecond for user processes, and 20 micro- 
seconds for servicing interrupt requests are typical of 
conventional time-shared general purpose processors. 
This means the overhead of performing real-time I/O 
with these processors is prohibitive, especially if the 
real-time I/O rate is high. 

A general purpose processor is also not well-suited 
to accurate timing control of relatively high rate of 
I/O. Since current central processors are generally 
asynchronous machines, it is difficult to time operations 
precisely with a central processor. Further timing skew 
is introduced by the tendenc}?” of general purpose time- 
sharing systems to get themselves into non-inter- 
ruptible states for short periods. These occur when 
the interrupt system is disabled, or higher priority 


interrupts are processed or the user process’s run 
period is interrupted. 

It would seem that processing all real-time I/O on 
the highest priority interrupt channel triggered by 
real-time clocks would be the most desirable method 
for CPU-performed real-time I/O since this seems to 
minimize skew and overhead. However, one finds it is 
impossible to make every device the “highest priority” 
in the interrupt system, and, in fact, there may be 
other system constraints which force some real-time 
I/O to be on a lower priority interrupt channel. 

Another limitation of interrupt driven real-time I/O 
is that, in general, one interrupt request will initiate 
several real-time I/O data transfers. This means the 
time spent in the interrupt service routine for per- 
forming the real-time I/O varies with the number of 
data transfers to be performed. 

These interrupt requests will tend to queue up unless 
their interrupt service routines are scheduled to be 
non-overlapping. 

The precise scheduling of both interrupt-driven and 
user-initiated real-time I/O is very difficult unless all 
of the real-time I/O demands are exactly periodic and 
synchronized to the clock system of the CPU. The 
source of this difficulty is the inherently slow reaction 
time of a general purpose processor to random requests 
for service. Only when all requests are guaranteed to 
be exactly periodic can they be scheduled in advance, 
and only then can the CPU be prepared to process 
them. 

We conclude that when real-time I/O is to be per- 
formed with precise timing, when the degradation of 
CPU performance from CPU driven real-time I/O 
cannot be tolerated, or when both synchronous and 
asynchronous demands must be handled, real-time 
I/O should not be performed by the conventional 
CPU driven methods. 

Separability of real-time I/O and computation 

It is clearly possible to design and build a special 
purpose processor for performing the I/O, but we must 
first show that it is meaningful and useful to do this. 
Carefully consider the real-time problem classes pre- 
viously mentioned. It is evident that most real-time 
problems place very stringent demands on I/O timing, 
but less stringent demands on the related CPU com- 
putation. That is, the sampling of input data and the 
conversion of output data to analog signals must be 
done at precise times, whereas the computation may 
be done any time after the input operation provided 
that the output data is ready before it is converted to 
analog signals. In addition, the I/O operations do not 
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require the full talent of a general purpose CPU, but 
do require better timing accuracy than the CPU is 
capable of providing. Therefore, it seems feasible and 
in fact desirable, to separate the I/O from the com- 
putation by providing separate hardware. 


The BBN hybrid processor 

Our system performs all real-time I/O through a 
device which we call a Hybrid Processor. It is a special 
purpose processor with the following important charac- 
teristics : 

1. Real-time I/O is performed directly between 
core memory and hybrid devices. 

2. The Hybrid Processor is multiplexed among 
several (up to four) “processes” thus allowing 
several independent, concurrent hybrid inter- 
actions. 

3. The switching time between processes is very 
small (approximately 100 nanoseconds), be- 
cause the information about the state of a 
process is small and can be changed rapidly, 
and because the scheduling: of which process to 
run is performed by the hardware. 

4. The time required to perform a single real-time 
I/O transfer is kept very short (approximately 
20 /use c) because of the special purpose nature 
of the processor. 

5. All real-time I/O interactions are handled in 
a uniform manner, thus requiring no hardware 
or software changes to incorporate a new hybrid 
I/O device. 

Command and data structure 

The Hybrid Processor implemented for an SDS-940 
time-sharing system operates on tables of commands 
and data. Each command, as shown in Figure 3, is 
paired with a data word. The nature of the command 
tells the Hybrid Processor whether the data word is 
for input or output, 

The device type field specifies the types of devices 
to which the Hybrid Processor talks. These include 
Analog to Digital Converters, Digital to Analog Con- 
verters, Digital level Inputs, and Digital level Outputs. 
The device number field selects a particular device 
within a type, such as a channel of the Analog to 
Digital Converter. 

After the command is executed and the data word 
accessed, the command and data table pointers are 
both indexed by one, and the remaining word counts 
for the tables are both decremented by one. The com- 


* 

1 

2 

3 

4 

14 

15 16 

17 18 

23 

R 

H 

C 

E 

A t 

DEVICE 

TYPE 

DEVICE 

NUMBER 


CONTROL TIMING DEVICE SELECTION 


R * RESTART IN l<£/x SEC 

H = HALT AND INTERVALS 

INTERRUPT CPU 
C = CYCLE COMMAND 
TABLE 

E = WAIT FOR EXTERNAL 
SIGNAL 


<f>4> = A/D 127 DEVICES 

<£ I = D/A 

I 4> * D/D IN 

II * D/D OUT 


Figure 3 — Hybrid processor commands 


mand and data table structure and pointers are shown 
in Figure 4. 

There is no fixed relationship between lengths or 
positions of commands and data tables. In fact, most 
often the command table is quite short and is recycled 
many times repetitively in order to fill up a large data 
table. In order to make this cycling operation efficient, 
some of the control bits are used to initiate the exchange 
of the cycle pointers with the current pointers. The C 
bit is used to specify cycling the command table 
pointers while the condition current data word count 
= 0 is used to specify cycling the data table pointers. 

The At field is used to specify the time, in 10 /usee 
intervals, between the current command and the next 
command. This value may be thought of as being 
placed into a clock which is a down-counter operating 
at 100 KHz. When the counter reaches 0, a request for 
service is initiated. It is important to remember that 


COMMAND TABLE DATA TABLE 



Figure 4 — Command and data table structure 
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the Hybrid Processor is in general running several 
hybrid processes. When a process requests service, it 
is quite likely that another request may be in progress 
or that a higher priority request may be granted first. 
In order to prevent conflicts from introducing cumu- 
lative timing skew, the individual process clocks are 
designed to count through 0 to negative values, and 
the At field is actually added to the contents of the 
process clock when the command word is fetched. As 
long as the result of this addition results in a positive 
quantity, the process will not be subject to cumulative 
skew and will be accurately timed on the average. We 
do a bit better than this, however, by taking advantage 
of the fact that the clock tick frequency is very ac- 
curately crystal-controlled and that the clock = 0 
pulse is a very precisely timed event at exactly the 
At intervals. This pulse is amplified and available to 
users for patching. It can be used to initiate a sample 
and hold gate, for example, or to cause the transfer 
between buffers in a double buffered D/A converter. 
This means extremely precise timing control with a 
resolution of 10 /^seconds, and crystal accuracy is 
achieved. 

The remainder of the control bits in a command 
word are used to stop (H) or restart (R) a process. An 
external signal control bit (E) enables a temporary 
stop to wait for a signal external to the computer and 
the Hybrid Processor to restart the process. 

Connection of hybrid processor to SDS-940 

The Hybrid Processor is attached to the SDS-940 
via a Data Multiplexor Channel (DMC) 5 with a modi- 
fied Data Sub-Channel II (DSC II) 1 , as shown in 
Figure 5. The status of a block transfer for a DMC 
sub -channel is normally contained in one of two “in- 
ternal interlace words” which are located in fixed 
adjacent positions in core memory. These interlace 
words contain the remaining word count in the left- 
most bits and current location of the block transfer in 
the rightmost bits. The economy of using core memory 
words instead of flip-flop registers is quite important, 
and this economy is retained by the Hybrid Processor. 
However, the DSC II has been modified so that the 
locations of these interlace words are no longer fixed 
but are uniquely determined by the particular process 
selected for service. That is, the first process (Process 
A), uses words n,* n+1 as interlace words; the second 
process (Process R), uses words n+4, n+5, etc. Also 
during the command fetch, the even interlace word is 


* where n is a memory address which is 0 MOD 4 
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Figure 5 —Connection of hybrid processor to SDS-940 

used as the current command pointer word, while 
during that data fetch or store, the odd interlace word 
is used as the current data pointer word. Note, that as 
before, the state of the block transfers is completely 
contained in the memory interlace words. 

Each hybrid process can switch command and data 
tables by using the “cycle” interlace words. Every 
process actually has four interlace words. Words n and 
n+1 are the current command and data table interlace 
words respectively. Words n+2 and n+3 are the cycle 
command and data table interlace words. A process 
can cause the contents of its cycle interlace words to 
be moved into the current interlace words. This ef- 
fectively switches the command and/or data tables to 
new core areas. This switch is accomplished without 
CPU intervention, but the CPU must establish new* 
cycle words soon after cycling occurs if the next cycle 
operation is expected to switch to yet ai3bi3r c;>f3 
area. In fact, a safety interlock will abort a process 
and signal an error interrupt if a cycle attempt is made 
before the previous cycle operation was properly re- 
sponded to by the CPU. More details of the BBN 
Hybrid Processor implementation on the SDS-940 are 
available in another document. 1 

Suppose the maximum time to service a hybrid I/O 
request were a. This time would be measured from the 
start of processing of a request by the Hybrid Pro- 
cessor to the completion of this request. Then a con- 
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servative estimate of the bandwidth of the Hybrid 
Processor would be 1/a . 

A simple scheduling technique would involve selling 
fractions of this bandwidth to users. Each user would 
buy 1/a: * of the bandwidth of the Hybrid Processor, 
such that 

1 /di < 1/a 

alii 

It would now be necessary for the Hybrid Processor 
(or whatever controls the Hybrid Processor) to insist 
that user i never perform hybrid I/O faster than a< 
seconds between hybrid interactions. This simple check 
could be performed by either hardware or software. 
This technique would guarantee that the Hybrid 
Processor would never be over-committed. 

On the SDS-940 implementation of the Hybrid 
Processor, the maximum a is approximately 20 /-isec.* 
This means the Processor has a guaranteed bandwidth 
of 50 KHz. 

Simultaneous requests for service are resolved by a 
simple priority network which selects the highest 
priority process currently requesting service to run 
(Process A is highest priority, B, C, D are in order of 
decreasing priority). This means the higher priority 
processes see the least instantaneous skew, but none will 
see any cumulative skew if the bandwidth scheduling 
rules are adhered to, and none will see instantaneous 
skew if the clock -= 0 pulse and sdmple/hold gates are 
used correctly. 


Device and timing protection 

The Hybrid Processor commands reference all 
hybrid devices in an unrestricted manner. If the user 
is given direct access to these commands, he could 
detrimentally affect another user's experiment by 
changing a value on another user's D/A converter, 
for example. Also, one user could easily lock out Hybrid 
Processor service from another user if he had a higher 
priority process and usurped all of the Hybrid Pro- 
cessor's capacity. It is therefore not feasible for the 
user to construct his own command tables. Instead, 
the time-sharing monitor constructs these tables for 
the user and keeps them in monitor core. The monitor 
makes certain that a user does not access another 
user's devices or usurp all of the| Hybrid Processor's 


* for D/A conversion , approximately 30 Msec for A/D conver- 
sions. 


time. The data tables are, however, kept in the user's 
own address space. 

Hybrid processor software 

Some very elaborate software (with about 2K words 
of machine language code) exists in our time sharing 
monitor on the SDS-940 for controlling the Hybrid 
Processor. This software locks and unlocks pages of 
data tables into core; sets up the transfers of data table 
pages from core to drum and vice versa; worries about 
anticipating pages before they're needed and getting 
the drum requests on a high priority drum queue; and 
provides a convenient handle on the hybrid processor 
for users . 

The user interface to the hybrid processor is pro- 
vided by some SYSPOP's, 5 which permit the user to 
assign and deassign hybrid devices; assign and deassign 
hybrid processes; define a sequence of command and 
data tables to be executed a specified number of times; 
specify a prototype of the command table which gets 
set up in the monitor’s address space; specify the 
boundaries of data tables in the user's address space; 
start and stop processes; and interrogate the status 
of assigned processes. 

Real-time CPU usage 

With a Hybrid Processor I/O system, user programs 
or user I/O need not be periodic. The I/O can be 
precisely timed using the Hybrid Processor independent 
of CPU activity. Therefore, it no longer is necessary 
to start CPU computation at exact times. 

Suppose that for each process the following para- 
meters were specified : 6 » 7 

1. T The period of the process (exact period if 
synchronous, minimum period if asynchronous). 

2. P The maximum amount of CPU time the 
process may require each period. 

3. D The maximum tolerable delay between the 
moment the process requests service and the 
time when all servicing has been completed 
(most synchronous processes would allow service 
to be completed any time during the period 
i.e.,D = T). 

4. Whether the process is synchronous or asynchro- 
nous. 

Using this characterization, the demand of the 
process upon the system might be phrased as follows: 
“When my process requests service it must be granted 
P seconds of CPU time within D seconds of when the 
request is made. My process will never request service 
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more often than T seconds after the previous request.” 
The parameters P, D, T, and the specification of 
whether or not the process is synchronous enables the 
system to decide whether the demands of this process 
(and all others) can be successfully met. The system 
cannot, of course, guarantee service to a set of real- 
time processes with arbitrary P's, P's, and P's. In 
fact, two restrictions are obvious: 

0 < Pi < D, < T< (1) 

and 

Z Pi < 1 (2) 

all i T i 

processes 

If the sum in (2) were greater than unity, it would 
be possible for the real-time processes to require more 
than 100 percent of the available CPU time. 

The scheduling algorithm used to select which pro- 
cess runs at any time is intimately related to the 
guarantees which the system can make to a set of 
users. It would be desirable to find a scheduling algo- 
rithm which would allow: 

y: P t - to be close to 1 

all i Ti 
processes 

and would minimize the amount of switching between 
processes to reduce overhead. It can be shown that if 
switching time is negligible, no algorithm can do a 
better job of scheduling for synchronous or asynchro- 
nous processes than the following: 

Bun the process which must be completed soonest 

That is, whenever a process requests service, the 
system computes the time when the process must 
complete service ( n ), which is equal to the current 
time plus D*. The system then decides to run the pro- 
cess with the minimum n. Whenever servicing is com- 
pleted or aborted (for trying to use more than P, 
CPU time) the system runs next the process with 
minimum r*. This algorithm and the necessary and 
sufficient conditions under which the system can under- 
take to run a set of processes are discussed in detail 
by Fiala. 6 

Costs 

The Hybrid Processor is not an inexpensive device. 
Approximately $20K of digital hardware components 
are necessary for a Hybrid Processor, not including 


any of the analog or hybrid equipment. The labor 
involved in designing and implementing the hardware 
and software is approximately 1J4 man- years. We 
believe this cost is justified by the utility of the pro- 
cessor. 

Future work 

Future hybrid processor revisions 

Several changes will be made in our new hybrid I/O 
system for our next research computer (a DEC PDP- 
10) . 8 These changes will increase the total available 
bandwidth, improve the command/data flow control 
so that even less CPU capacity will be required to 
direct the Hybrid Processor, make several improve- 
ments to the clock system, etc. 

New clock system 

A 36-bit time of day clock will be implemented 
which counts at 100 KHz. It will be possible to read 
this time via an I/O input command over the PDP-10 
I/O buss. This 36-bit count will recycle approximately 
every eight days. 

At least two 36-bit “alarm” registers will be used 
in conjunction with the clock* These registers will be 
compared with the values in the clock after each 
“tick” settles down. If a match on any register is 
found, the following events will occur: 

A. A CPU interrupt request will be generated so 
that a new 36 -bit value may be placed in the 
alarm register and any CPU action which was 
to be initiated at this time will be triggered 
(such as the scheduling of a new process to 
run). 

B. Each alarm register will have eight enable bits 
whose set output will be gated with the alarm 
pulse and this gated result will be buffered and 
available for patching to trigger external devices. 

The control of the alarm registers will require the 
use of a PDP-10 I/O output instruction to the selected 
alarm register to set any combination of the eight 
enable bits followed by a PDP-10 I/O output in- 
struction to the selected alarm register to set up the 
36 bits of the register itself. 

Hybrid I/O 

The hybrid I/O capability will be quite similar to 
the capability of our current SDS-940 Hybrid Pro- 
cessor. The channel will operate on command and data 
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tables with each command paired with a corresponding 
word in the data table. 

The command format will also be similar to the 
current Hybrid Processor on our SDS-940. However, 
the flow through these command and data tables will 
be directed by two new tables per process called the 
command and data flow tables. These will replace the 
“cycling” operations by “driving” the Hybrid Pro- 
cessor through command and data tables a specified 
numbers of times. The “cycling” operation had the 
disadvantage of putting a large burden on the CPU 
for processes which cycle often (which proved to be 
true for many processes). 

We will also implement the command and data 
table pointer words in hardware to increase Hybrid 
Processor bandwidth. 

CONCLUSIONS 

The use of a Hybrid Processor permits many real- 
time experiments which were not possible in the past, 
and are not possible on other real-time computer 
systems. We are able to handle high speed as well as 
asynchronous hybrid interactions. Most of this is 
made possible by the separation of the real-time I/O 
functions from the computation function. The real- 
time I/O functions are performed by a processor 
especially designed to handle real-time I/O, and the 
computations are performed by a general purpose pro- 
cessor. 
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INTRODUCTION 

An on-line software checkout facility for special 
purpose computers (referred to as the Flight Software 
Development Laboratory) has been created to aid 
programmer/engineers in the development of programs 
that will operate in a spaceborne computer aboard the 
Apollo/Saturn IB and V Launch Vehicles. The Flight 
Computer operates as an integral part of various 
vehicle subsystems in the Instrument Unit (IU). The 
subsystems provide onboard navigation, guidance, con- 
trol, sequencing, data compression, and ground com- 
munications. These functions are illustrated in Figure 
1. Continued emphasis is placed on error-free flight 
software, since it is an essential element in overall 
vehicle performance. No opportunity exists to test or 
exercise the flight program in its actual flight en- 
vironment prior to a mission. Therefore, to ensure the 
integrity of the flight program, simulators are used to 
accomplish flight testing. The purpose of this paper is 
to present the organization of one such simulator that 
has been created for the sole purpose of the develop- 
ment and checkout of Saturn flight software. The 
emphasis throughout the design and implementation 
of the Laboratory has been that it must be user- 


* This work was performed under contract with NASA’s Marshall 
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oriented for program checkout. Before the existence of 
the Laboratory, available facilities for checking out 
flight programs were oriented to hardware checkout. 
Although such facilities can be, and have been, rigged 
for program checkout, they have not provided the type 
of assistance required to produce the quality of soft- 
ware demanded by spaceborne computers. The Labo- 
ratory is believed to be unique in the capabilities it 
provides to the programmer/engineer in controlling and 
affecting the operation of the Flight Computer in a 
real-time environment. 

Flight software development begins with a set of 
explicit engineering requirements: equation and logic 
definition, range of variables, and expected performance 
data. After an intensive analysis of the requirements, 
the flight software is designed and organized to meet 
these engineering requirements with minimal flight com- 
puter memory and reasonable flexibility. After the 
flight program has been flowed, scaled (fixed point 
computer), coded, assembled, and checked out by the 
program unit or module, the flight phases are integrated 
and checked out. This process continues until the entire 
flight software has been integrated. The procedure 
described above requires that the programmer/ engineer 
be able to measure and evaluate his progress in an 
efficient manner. The purpose of this laboratory facility 
is to provide the programmer/engineer with a user- 
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oriented tool by which he is able to test and evaluate 
his programs in a simulated flight environment, using 
an actual spaceborne computer and interface hardware. 
This enables him to measure and evaluate flight soft- 
ware performance against the engineering requirements 
for the many vehicles and environmental variations. 

The Laboratory user must produce quality software 
in the shortest possible time framp. The key objective 
in designing the Laboratory was; to provide accurate 
simulation models in the form of user-oriented tools. 
Thus, the Laboratory user can swiftly determine the 
progress and results of his work through real-time man- 
computer interaction. The computer offers data, 
counsel, and guidance to the man, who in return sup- 
plies certain indispensable knowledge of the overall 
system. Systems reliability and ; effective communi- 
cations between the Laboratory and user play a major 
role in establishing user confidence. Operating experi- 
ence in the Laboratory has clearly demonstrated that 
these objectives have been satisfied). 

Hardware configuration 

The Laboratory has as its main hardware components 
an IBM System/360 Model 44, linked through a special 
purpose interface to a Saturn Launch Vehicle Digital 
Computer and Launch Vehicle Data Adapter. An 
IBM 2250 Display Unit is employed as an integral 
part of the Laboratory, providing two-way man- 
computer communications. Figure 2 illustrates the 
organization of the hardware components and in general 
indicates the basic paths of information flow. 

One high speed multiplexer channel has been dedi- 
cated to the flight hardware interface. Each of the 
subchannels is likewise dedicated, j as shown in Figure 
2. The dedicated channel and subchannels minimize 



Figure 2 — Flight software development laboratory— 
Block diagram 


interference from other I/O activities and enable the 
creation of a special low overhead channel scheduler. 
These features incorporated with the 32-level priority 
interrupt scheme make the Model 44 highly responsive 
to the real-time interface requirements. The other high 
speed multiplexer channel is dedicated to disks that 
support real-time data collection and permit fast 
access for the display system. 

In this particular application, six of the 32 levels of 
priority interrupt are used by external hardwired equip- 
ment. The others are used by internally generated 
software functions for scheduling time-dependent soft- 
ware functions. 

The Launch Vehicle Digital Computer and Launch 
Vehicle Data Adapter are the two flight components 
that have been integrated into the Laboratory. 

The Flight Computer is a general purpose computer 
which, under control of a stored program, processes 
data serially, using fixed-point 2’s complement 
arithmetic. 

The Launch Vehicle Data Adapter serves as an 
input/ output device for the Flight Computer and the 
central station for the signal flow in the Saturn As'tri- 
onics System, which is illustrated in Figure 1. The Data 
Adapter accepts discrete input signals from the stage 
switch selectors, Instrument Unit command receiver, 
ground launch computer, telemetry computer inter- 
face unit, telemetry data multiplexer, control dis- 
tributor, and other vehicle equipment. It has output 
registers to provide discrete output signals to the 
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above-mentioned equipment. It also accepts and pro- 
cesses computer interrupt signals from the ground 
launch computer and Instrument Unit equipment. 

The interface unit provides all the normal ground 
and flight communications paths between the flight 
hardware and the central processor. However, this 
interface was designed to go beyond these requirements. 
The interface is unique in that it was designed to place 
emphasis on (1) minimizing the central processor inter- 
face traffic and (2) maximizing user visibility by giving 
the user the control of internal flight hardware oper- 
ations and the access to information internal to the 
Flight Computer. Also, the unit was designed for ease 
of maintainability. Specifically, three major capa- 
bilities have been incorporated into the interface unit . 
First, the interface unit has been designed so that it 
can control the internal operation and timing of the 
Flight Computer and Data Adapter. Secondly, the 
interface contains special hardware, oriented toward 
supporting flight program debug as opposed to program 
verification, which is an independent program audit 
function performed using the debugged programs. 
Finally, the interface unit has been designed so that 
extensive automatic diagnostics can be run from the 
central processor to isolate suspected interface failures . 

The IBM 2250 Display Unit is organized around a 
cathode ray tube on which computer-programmed 
graphic and alphameric information is displayed at 
high speeds. This provides visual communication be- 
tween the computer and the user. In addition, key- 
boards and a light pen provide the user with a versatile 
means of entering and modifying computer information. 
With the display system, the user has direct and rapid 
access to stored data which can be selected, processed, 
modified, and displayed in alphameric and graphic 
representation. For example, the user can display and 
modify memory in both the Model 44 and the Flight 
Computer through the display unit. 

The display unit was configured to minimize central 
processor time and core requirements on the Model 44. 
A primary feature of the display unit is a buffer storage 
of 8,192 bytes, which is used to store images for display 
regeneration purposes. The use of a buffer enables the 
display unit to operate concurrently with the com- 
puter system, freeing the main core and the channel 
for other functions. Additional features which greatly 
compress the image storage requirements are the 
absolute vector and character generator features. 

Operating system 

The operating system for the Laboratory is desig- 


nated as the Checkout Control System (CCS). It is 
the operating system which is furnished with the IBM 
System/360 Model 44, with additions and modifi- 
cations to convert the system from a sequential batch 
job processor to a real-time multiprogramming pro- 
cessor. However, all the original functions and features 
have been retained. Programs not requiring the ele- 
ments of a real-time multiprogramming system may 
operate as though the additional facilities were not 
present. 

The principal area of the Model 44 Programming 
System (44PS) in which additions and changes have 
been made is the supervisor. The required functions of 
CCS include the ability to support various operations 
of computing at precise intervals of time. These oper- 
ations are selected by a priority scheme which controls 
the sequence of execution. Other operations are de- 
signed to execute as a result of interrupts induced 
outside the central processor. These are generally of 
such importance that their priorities are higher than 
operations initiated as a result of time. The function 
of multiprogramming through a scheme of priority 
interrupts and the requirement of real-time operation 
are the principal requirements for CCS. To satisfy 
these requirements, capabilities in three principal areas 
have been added. These are multiprogram scheduling , 
real-time input/output scheduling, and application 
program phasing control . 

A principal element of the program scheduling fa- 
cility for CCS is the timer queue (Figure 3). It consists 
of a string of items ordered in ascending sequence of 
time-to-execute. Each item of the queue contains a 
pointer to the routine to be executed at the correspond- 
ing time. When the timer interrupt occurs, the timer 
processor routine gains control and the routine corre- 
sponding to the timer interrupt is placed into a state 
of execution. Its immediate or deferred execution is a 
function of priority levels. When a timer interrupt 
occurs, a comparison is made between the priority 
level of the routine currently in execution and the level 
of the routine for which the timer interrupt has oc- 
curred. If the level of the current routine is higher 
than or equal to the other, it resumes execution while 
the execution of the lower priority routine is deferred. 
Conversely, if the priority level of the current routine 
is lower, the other is placed immediately into execution, 
temporarily suspending the first. This method of 
scheduling uses the hardware priority interrupt system 
and additional software of CCS. 

Figure 4 illustrates some of the conditions which 
may occur with a typical combination of timer-initiated 
priority routines. Notice that the execution priority 
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Figure 3 — Timer queue 



Figure 4 — Timer-initiated multiprogramming 

level of timer interrupts is coincident with the pri- 
ority level executing at that tinie. In addition, the 
figure shows how the high priority routine gains control 
from a lower priority routine. In this priority system, 


low magnitude numbers correspond to high level pri- 
ority. 

Figure 3 illustrates a timer queue containing severa l 
items which will initiate programs on various levels at 
different times. These items match the information 
illustrated in Figure 4. As each item reaches the top 
of the list, the internal interval timer is set to the 
increment of time from “now” until the program is to 
execute. When the timer expires, a priority level re- 
quest for the program is set, the item is removed from 
the queue, and an interval for the next item is calcu- 
lated. The program pointed to by the item which 
caused the timer interrupt is attached to its priority 
level for execution. When the queue becomes empty, 
the nonpriority level regains control. 

The second major feature of program scheduling is 
the supervision of priority interrupts by the priority 
interrupt executive. Certain ‘housekeeping’ functions 
are performed by this feature, such as register saving 
and restoring, as control passes up and down the pri- 
ority levels. Control is automatically given to the 
priority interrupt executive whenever any one of the 
32 levels is activated. The rtmtine to be given control 
is determined, registers are saved as required, and a 
pointer to parameters is set. Control is then given to 
the priority routine. When the routine concludes its 
operation, it returns control to the priority interrupt 
executive which restores registers and causes the rou- 
tine on the next highest level to resume or begin exe- 
cution. 

Figure 5 illustrates the overall flow of data and 
control in CCS. Whereas Figure 4 illustrates the effect 
of program scheduling, this figure illustrates the me- 
chanics involved. A program currently executing may 
be interrupted by the timer (1). The timer processor 
selects data from the queue (2) and attaches the routine 
to execute (3). It sets a new interval in the timer (4) 
and initiates a priority interrupt (5) (assuming the 
routine is of a higher priority than the current program) . 
The priority interrupt executive determines the routine 
to execute (6) and gives control to the routine (7) 
which returns control (8) when finished. The executive 
then returns control to the interrupted program (9). 
At step 5, the condition may exist that the timer- 
initiated routine is of lower priority than the current 
program. If so, the timer processor returns control 
directly to the current program (10). 

Both application and system programs may queue 
routines using the timer queue (11). The actual queue- 
ing is done by a system routine. 

In Figure 5, the dash line connecting “programs” 
and “current program” is intended to show that the 






On-line Software Checkout Facility 793 


© 



Figure 5 — Program scheduling 


current program is merely one of many which is se- 
lected by the priority interrupt executive. 

The 44PS I/O channel scheduler was changed to 
handle a multiprogrammed environment. Among the 
changes were new real-time, core resident I/O device 
routines, a gated front end to the channel scheduler 
and 1/ O termination interrupt routine, and de-queueing 
logic at the exit point of the channel scheduler. 

Of several options available to modify the channel 
scheduler, one was chosen which allows only one trans- 
action into the scheduler at a time. The effect of this 
method is to allow an I/O request to enter the scheduler 
and be serviced only if the scheduler is not currently 
processing a previous request. Due to multilevel pro- 
gram execution, an I/O request being made while 
another is being processed can occur only when the 
new request is of a higher priority level routine. There- 
fore, the request in process when interrupted by a 
higher priority level is resumed at its point of interrupt. 
The new higher priority level request is serviced im- 
mediately thereafter. 

A posting function is associated with I/O termination. 
Its purpose is to allow priority level routines to request 
I/O, give up control on their level, and then regain 
control when the I/O is completed. 

To meet the demands of real-time I/O for the Flight 
Computer, a special low overhead channel scheduler 
is used. The gated channel scheduler using 44PS has 


some features neither necessary nor required for the 
Flight Computer channel. 

The CCS phasing function provides the capability 
to load and initialize the required set of application 
programs under the control of operations initiated at 
the display console. An application core load (phase) 
contains all programs required in memory at the same 
time to perform one of the major simulation functions. 

The phases are resident on the system residence 
disk volume and are transferred to the application 
program area of central processor memory when re- 
quested by the flight programmer/engineer via the 
display console. The transfer is implemented by CCS 
through standard 44PS load capabilities. After a phase 
has been loaded, the unused portion of memory is 
calculated and designated as available work space which 
will later be used by the phase programs. The phase is 
given program control at its entry point on the non- 
priority level of program execution. 

Selection of the phase to be loaded and executed is 
made from the initial tutorial display, which is setup 
by the initial loading of CCS. The user at the display 
console makes his selection of the phase via the light 
pen instrument. The selection of a phase initiates CCS 
operations which result in a core load of the application 
program area. When the user chooses to change from 
the execution of one phase to another, he requests the 
redisplay of the initial tutorial. Upon this request, CCS 
executes an orderly shutdown of the activities in pro- 
cess for the current phase and then reloads the central 
processor memory with the phase requested. Figure 6 
illustrates the concept of phased program loads and 
the general allocation of core for the application 
programs. 

Application software 

Application software in the Laboratory is designed 
to perform four basic tasks: (1) hardware diagnostics, 
(2) flight simulation initialization, (3) flight simulation 
execution, and (4) post -flight data reduction. A self- 
contained set of software programs, called a phase, 
has been constructed to perform each of these tasks. 
At any given time only one phase resides in core, with 
both the communication region and temporary data 
set residing on a disk device. 

The hardware diagnostics phase contains programs 
which perform the power-up and initialization function 
for the Flight Computer and its interface unit. The 
diagnostic programs are required for maintaining and 
servicing the interface unit. 

The flight simulation initialization phase consists of 
the programs to specify the details and options of the 
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particular flight simulation the user wishes to make. 
He may specify such items as loading, modifying, and 
accessing the flight program; digital command system 
orders; computer interface unit measurements; real- 
time output quantities; flight pause points; data to be 
saved for post-flight analysis; the particular Saturn 
vehicle to be simulated; and the type of simulation 
run to be made. 

The post-flight data reduction phase contains the 
software necessary to process data that were collected 
during execution of the real-time simulation phase. This 
consists of data from the Flight Computer, the 6-DOF 
simulator, and the FORTRAN flight program model. 
The capability of generating plots on the display unit 
is also provided, along with conversion, formatting, 
analysis, and outputting of data on the printer. 

In these three phases, very little use is made of the 
priority interrupt feature on the Model 44 as an in- 
strument of real-time operation. Practically all pro- 
grams are initiated by operator action at the display 
console, and programs receiving control operate on the 
priority level assigned to display control. This same 
level is reserved for display control in each phase. 
However, the entire real-time simulation phase is built 
around and is controlled by the priority interrupt 
feature. 


The application programs in this phase perform the 
following tasks: 

. 6-degree-of -freedom (6-DOF) launch vehicle 
simulation. 

. Digital command system simulation (ground data 
link) . 

• FORTRAN equation and logic model execution 
of the flight program. 

. Data reduction and analysis. 

Figure 7 presents an overview of the major appli- 
cation software components required for the real-time 
phase and their interrelationships. The operator, 
seated at the display control unit, communicates with 
the system through preformatted tutorial displays. 
Three data sets are used as an input interface between 
the display unit and the real-time application software. 
The flight and vehicle specilications data set is used to 
structure the vehicle and flight program skeletons to 
any of the many missions under development. This 
data set is defined and generated during the initiali- 
zation phase. The two remaining data sets (vehicle 
perturbation and data reduction analysis and control) 
are accessible both during the initialization and real- 
time simulation phases. The vehicle perturbation 
functions allow the operator to specify various vehicle 
anomalies such as thrust perturbations, command 
receiver failures, staging or event failures, inertial 
platform failures, etc., in addition to the start time and 
duration for each. The appropriate control information 
is ordered by time of occurrence and recorded in a 



Figure 7 — Application software functional flow — 
Real-time phase 
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vehicle sequencing queue until the specified activation 
time. A similar procedure is followed in the creation 
of the data reduction analysis and control data set. 

The FORTRAN flight program model is an engi- 
neering representation of the flight program. It serves 
as an additional reference to measure and evaluate 
actual flight software performance. 

The real-time data buffer receives data from the 
FORTRAN flight program model and the 6-DOF 
vehicle simulator as well as telemetry data from the 
flight hardware. This entire set of data is recorded on 
tape for the post-flight data reduction phase. Data 
selected for real-time observation is organized, for- 
matted, and recorded on the disk. This particular data 
is accessible at any time upon request from the display 
console as either tabular data or graphic plots. Such 
displays may be generated from historical data be- 
ginning at some particular point in the past and carried 
up through the current values, or it may start with 
current values. In both cases, the display is continually 
updated from the real-time data buffer. In addition, 
the real-time tabular data may be permanently re- 
corded on the printer. 

The 6-DOF simulator consists of both rotational 
and translational dynamics as well as a simulation of 
the vehicle subsystems involved in vehicular control, 
such as sequencing, digital command system, etc. This 
simulator may be driven by either the actual flight 
hardware or the FORTRAN flight program model. In 
turn, the 6-DOF simulator supplies inputs to both 
the flight hardware and the FORTRAN flight program 
model. 

Each of these real-time application programs is 
assigned a relative priority and an absolute priority 
level. Figure 8 shows groups of application programs 
and their priority interrupt level assignments used in 
this system at the present time. In general, high pri- 
ority levels have short execution times. These routines 
respond to discrete external events or internal keying 
by the interval timer. Routines with longer execution 
times are on lower priority levels. Among these routines 
are real-time reduction and graphic support. The pri- 
ority level assignments, both hardware and software 
activated, can be changed easily to optimize system 
performance. 

The requirements of the real-time application 
programs guided the design of the program scheduler 
in CCS. As a result, the priority scheduler provides 
the real-time application programs with a highly 
flexible operating environment, making the following 
system attributes possible: First, the System/360 
Model 44 and the Flight Computer operate asynchro- 



Figure 8- — Functional groupings of priority interrupt 
line assignments (0-31) in real-time phase 


nously with respect to one another. This condition 
relieves the system of several constraints in its operating 
environment, which, if present, tend to constrict the 
system. Second, the application programs are very 
responsive to the information supplied from the inter- 
face unit via the priority interrupt feature. There are 
six high priority levels of the thirty-two which are 
assigned to signals from the interface unit. This 
structure permits immediate response to Flight Com- 
puter conditions in the Model 44 by interrupting 
programs operating on a lower priority level. Third, a 
related function to point two is that the low priority 
operations (such as servicing display unit operations) 
execute on a noninterference basis with the time 
critical functions on higher priority level assignments. 
Fourth, with respect to time-slicing, it is a self-ad- 
justing system. This means that programs on lower 
priority levels will automatically give up time to 
programs operating on higher levels. For example, the 
solution rate on vehicle navigation can be changed by 
simply altering one constant, which will result in the 
self-adjustment of the system to the new solution 
rate. 

User /system interaction 

With reference to user/system interaction, the system 
may be said to have two primary objectives: to pro- 
vide a more detailed and complete checkout of flight 
programs, and to ease the burden and reduce the time 
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required of a flight programmer to checkout a flight 
program. To meet these objectives, the following 
system criteria were established : 

1. Minimal knowledge of the central processor 
required of system users. 

2. Minimum number of people required to run a 
simulation. 

3 . Centralized operator control stations . 

4. Maximum influence on the flight program by 
the user. 

5. Minimum time required to. setup runs. 

6. Entire simulations run by nontechnical oper- 
ators. 

In order to satisfy these criteria, system start-up 
procedures were automated, peripheral device manage- 
ment routines were written (to allow tapes and disks 
to be remounted on arbitrary drives), direct access 
storage was fully utilized, program overlay was used 
extensively, and all operator control (after initial setup) 
was centralized at the display unit. 

The graphic display software provides the interface 
between the application programs and the IBM 2250 
Display Unit through which tb|e user communicates 
with the system. Through this software, the display 
console operator is in complete control of the flight 
program and has a very wide range of capabilities in 
initializing, controlling, monitoring, and analyzing the 
flight program performance. 

The display program that provides this interface 
operates in a real-time environment. Therefore, to 
reduce the core and time requirement necessary to 
create each individual display, all displays are prefor- 
matted by an off-line graphic program. (See Figure 9.) 
This program receives card images of the text and 
control information associated with each display and 
creates a ‘book’ of displays. This display book resides 
on a disk cartridge and is divided into one index and 
as many chapters as there are displays. Each display 
text is in an ‘expanded’ format containing embedded 
graphic orders and in a format ifeady for transmission 
to the display unit buffer. It requires no editing, scan- 
ning, or unpacking in real time. The keyboard and light 
pen pages provide control information needed by the 
real-time display control program to respond to oper- 
ator keyboard and light pen inputs. For each chapter 
created by the off-line program, there is a corresponding 
entry in the index. Each index entry contains the name 
and disk address of its corresponding chapter. 

During system initialization, the real-time display 
control program reads the index into core and retrieves 
the system initial display from the display book. The 
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Figure 9 — -Display book generation and organization 


initialize phase of the real-time display software is 
complete when the initial options are displayed. Light 
pen or keyboard inputs from the display console oper- 
ator are required to initiate the display of new texts. 

The display device routine services the light pen and 
keyboard I/O interrupts and schedules the display 
control program on the timer queue for immediate 
execution on a predetermined priority interrupt level. 
When display control gets control on its priority inter- 
rupt level, the type of action taken by the operator is 
examined. The light pen page and the keyboard page of 
the current display chapter define all legal light pen 
inputs and keyboard entries. 

The display control program displays the proper text 
in response to the operator’s light pen actions, vali- 
dates keyboard inputs, and passes control information 
specified in the keyboard or light pen page. Response 
to the user’s inputs appears to be instantaneous to the 
operator (500 milliseconds maximum). When a longer 
time is needed to process the operator’s request, the 
program to perform the operation is scheduled to oper- 
ate on another priority level and normal display pro- 
cessing continues. The operator may initiate several 
tasks to be performed simultaneously. 

On a light pen detect, the light pen page of the current 
display chapter is examined for a possible NEXT 
PROGRAM. If one is specified, control is passed to it 
and the NEXT DISPLAY presented when the 
program returns control to display control. When the 
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NEXT PROGRAM is omitted from the light pen page, 
the NEXT DISPLAY is presented immediately and 
the display control program priority level is freed for 
additional operator inputs. 

Display control makes legality checks on all key- 
board input against the legal data in the keyboard page 
of the display chapter and passes the data to the des- 
ignated program. When it is necessary to input a 
large amount of data through the display compose 
fields, or when many displays and light pen actions are 
required to initiate a procedure, the light pen options 
and keyboard entries may be predefined on cards or 
disk. The display software can initiate one option 
after another and each time return to the predefined 
option set for another option rather than waiting on 
the console operator for further action. This, speeds 
the setup for procedures done repetitively and greatly 
reduces the possibility of operator errors. 

The real-time display control program accepts inputs 
from the display console operator and also from the 
application programs. While there can be many appli- 
cation programs providing input during a small interval 
of time, there can be only one display being presented to 
the operator. The inputs affecting future or past dis- 
plays are entered in a queue and may be viewed by the 
operator by use of the function keyboard. 



Figure 10 — Display control interface 


An application program on any priority interrupt 
level may use the display system to communicate with 
the user through previously defined input areas in 
the display text. These input areas may be defined 
by the user to suit his needs and to present his input 
data in an easy-to-read format. For example, if the 
input areas in a display are defined in a column format, 
the programmer’s data will automatically be presented 
in a column format when the input areas are filled. 

Figure 10 illustrates the display control interface 
with the display unit. The control information pages of 
a display chapter will remain in memory as long as 
the display text is being presented to the operator. 
When the operator uses the light pen or keyboard, 
display control will use these pages to determine the 
NEXT PROGRAM and NEXT DISPLAY. When the 
chapter for the next display is retrieved from the 
display book, the text page processor merges the display 
text with any application program data to be displayed 
and transfers the combined text and data to the dis- 
play buffer. The new light pen and keyboard pages 
will remain in memory to identify the next operator 
action. 

A function key must be lit by the function key pro- 
cessor before the key becomes active. An application 
program can direct display control to activate a function 
key and present a given control display when the oper- 
ator uses the key. 

As the user views the system, the heart is the dis- 
play system. The programmed book of tutorial displays 
is provided to give him complete control over the 
Flight Computer, the interface, and the simulation 
itself. The book along with the use of the light pen 
and the display keyboard leads the user through the 
functions of powering up the Flight Computer and 
Data Adapter, loading and accessing the Flight Com- 
puter memory and registers, setting up and executing 
the simulation, and post-processing simulation data. 
Each user option is carefully spelled out, and all user 
input is verified before it is accepted by the system. 
Should error conditions occur (due to incorrect input, 
hardware failure, or flight program failure), error mes- 
sages are presented to the user with instructions as 
to the recovery action. 

A complete history of user actions at the display 
unit is logged on the console typewriter for later refer- 
ence. 

Figures 11 and 12 demonstrate user activity at the 
graphic display terminal. By using the light pen, the 
user is able to travel through the display structure 
illustrated in Figure 11. Figure 12 depicts photographs 
of the displays represented by the structure of Figure 11. 
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Path 1 
Path 2 


Along with the capability to setup and execute 
complete runs through the display system, the user 
has the ability to monitor the execution and take re- 
start dumps. These, too, are controlled from the dis- 
play console. A request may be made to printout 
specific quantities on the printer as; they are calculated 
in either the Flight Computer or the central processor. 
At the same time he may request various status displays, 
tabular displays, or data plots to appear on the dis- 
play unit. Should an irregularity be detected, the oper- 
ator has the ability to pause the; simulation, process 
all the data acquired thus far, arid make changes or 
corrections. He then has the option to continue the 
simulation, restart the simulation from various points 
where restart information is available, or terminate 
the run entirely. 

Because of the complexity of the flight programs, 
the flexibility of the facilities of the Laboratory, and 
the desire to ease the burden of job setup as much as 
possible, a scheme has been implemented to sequence 
automatically from start to finish. The sequencing and 
input information can be saved on cards or in data sets 
on direct access storage. Through the use of this scheme, 
it is possible for a flight programmer to setup complex 
runs and submit the job for running by an operator. 

All the pertinent information flowing through the 
real-time data buffer is collected and saved on tape 
(the ‘Post-Processor tape’) for later analysis. If the 
flight programmer has requested SNAPS and TRACES 
of actual Flight Computer memory locations during 
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Figure 12 — Tutorial displays 
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instruction execution, this information is saved on the 
Post-Processor tape. The Post-Processor tape may be 
processed immediately or at a later date . 

When processing the tape, the user has several options 
available to him through use of the display system. He 
may selectively dump any data on the tape and request 
that the data be converted to decimal form in specified 
units prior to printing. He may have his data printed 
in a tabular form or he may plot data on the display 
unit. Special calculations may be performed on some 
of the data and the results printed or displayed. He 
may print or plot errors between various quantities to 
verify that the flight program results agree with the 
6-DOF simulator. 

CONCLUSION 

The problem of designing an on-line software check- 
out facility and ensuring that the programmer/engineer 
has the capabilties he requires is a complex task. 
The techniques of simulation, the selection of equip- 
ment, and the methods employed for man-computer 
and computer-computer interface; must be carefully 
weighed. The requirement for pin-point accuracy in the 
Laboratory resulted in a real-time multiprogrammed 
system which is proving an invaluable tool for assisting 
in the development and checkout of the flight programs . 
It has made possible the development of flight soft- 
ware which can be relied upon to a much greater extent 
than before and has reduced the amount of time 
necessary to produce it. In effect, the Laboratory pro- 
vides the necessary aids toward producing successful 
flight software. 


Some of the software concepts employed in the 
Laboratory may certainly be applied in related areas of 
simulation technology. The operating system and dis- 
play support software have direct conceptual appli- 
cation in airborne and space vehicle simulators. 
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A hybrid frequency response technique 
and its application to aircraft flight flutter 
testing 
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INTRODUCTION 

Large aircraft, such as the Lockheed C-5A, can be 
forced to resonate on the ground in a large number of 
closely coupled vibration modes which involve the 
combined motion of lifting and control surfaces, 
fuselage and engines. During flight, atmospheric 
disturbances can also excite these vibrational res- 
onances, though, under normal conditions, they are 
damped to a safe level because the airstream is able 
to extract energy from the vibrating structure. How- 
ever, there exists the aeroelastic phenomenon called 
flutter 1 — under certain conditions the structure is 
able to extract energy from the airstream and the 
amplitude of a resonance can very rapidly increase 
to a destructive level. Clearly, the damping of all 
resonances must remain positive throughout a wide 
range of flight conditions. This is verified by flight 
flutter test programs during which aircraft are proven 
safe at an airspeed and altitude before proceeding to 
a higher airspeed. In one method of flutter testing of 
large aircraft, the resonant modes are excited during 
flight by oscillatory forces from aerodynamic vanes. 2 
A frequency sweep technique is used; the frequency 
of the oscillatory forces is varied continuously from 
about 1 to 30 Hz. Accelerometers or other transducers 
indicate the response at various locations on the air- 
craft. After an excitation sweep, the frequencies and 
measures of damping of the resonances are determined, 
and a decision is made about the safety of a higher 
airspeed. 

The importance of time in a flight flutter test can- 


not be overemphasized. Data should be gathered in 
as short a time as possible in order to relieve the prob- 
lems of high speed, low altitude testing. Furthermore, 
the analysis should be completed very quickly to 
minimize non-productive flight time. A successful 
data reduction system reduces the time-consuming 
process of record analysis and increases the time availa- 
ble for engineering interpretations and decisions. This 
has been achieved by using the Lockheed-Georgia 
Company’s hybrid computing system, consisting of 
four Ci 5000 analog computers interfaced with a CDC 
6400 digital computer, and a new rapid frequency 
response analysis technique. 

Theory of flight flutter testing 

The most powerful techniques for system stability 
analysis have evolved from the study of sets of simul- 
taneous ordinary differential equations. 3 As a result, 
flutter testing is based on the assumption that the 
aircraft in flight can be represented as a linear lumped 
parameter system described by the equations: 

an(p)xi + ai 2 (p)x 2 H b au(p)x& = fi(t) 

a 2 i(p)xi + a 2 2(p)x2 + • • — f- a2*(p)x^= f 2 (t) 

&A:l(p)Xl + a*2(p)x 2 + •••• + &A;fc(p)XA! = ffc(t) 

x k are the coordinates of the system, f&(t) are the 
exciting forces and a^(p) are quadratic functions of the 
operator p = d/dt. Because linear systems obey the 
superposition principle, one can, without loss of 
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generality, excite an aircraft at a single point. Thus all 
inputs, except f <(t) can be made zero. Using the Laplace 
transform, transfer functions can be formed between 
the f» and xy, having the general form 

f xy(t) 1 = XX s) = b m a m -\ h bis + b 0 _ N(s) 

Fi(s) a n s« -\ + aiS + a 0 D(s) 

where s is the transform variable. 

Although N(s) may change for several input-output 
combinations, the denominator polynomial D(s) is 
the same. D(s) = 0 is the characteristic equation of 
the system and the location of its roots, s=<r + jco, in 
the complex plane defines the stability of the system. 
Each structural resonance corresponds to a conjugate 
pair of roots. A flight flutter testing technique that 
is used involves excitation of the aircraft by a sinusoidal 
force input which is swept slowly from about 1 to 
30 Hz. By measuring the frequency response (both 
amplitude and phase) of each transducer signal Xy to 
the exciting force f*, and by applying the frequently 




Figure 1 — The data reduction system 



Figure 2 — Data signal showing envelopes and zero 
crossings 


used technique of Kennedy and Pancu, 4 a and co can 
be found for all lowly damped resonances. The technique 
of Kennedy and Pancu is based on a relationship be- 
tween the damping and the rate of change of phase 
with frequency (of xy relative to f*) as the frequency 
sweeps through a resonance. The variations ofcrandco 
with airspeed and Mach number show the stability 
trends of the aircraft. For lowly damped resonances, 
— fco, where f is the familiar damping ratio 3 and 
co can be taken as the circular frequency at amplitude 
resonance. In this paper the emphasis will be not on 
the theory of flight flutter testing, but on the hybrid 
technique for measuring the frequency response of 
the transducer signals xy to the exciting force f <. 

The hybrid frequency response system 

The new data reduction system for the C-5A flig;ht 
flutter test is shown in Figure 1. Twenty of the data 
signals, which are telemetered from the aircraft to 
the ground receiving station, are further transmitted 
in multiplexed frequency modulated form via 15 
KHz lines to the hybrid computing area two miles 
away. Discriminators restore the twenty signals to 




Figure 3 — Typical data signal and the computer 
generated envelopes 
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analog form and nine of these are selected for further 
processing. The first operation is bandpass filtering. 
In each of nine heterodyne-type tracking filters, the 
center frequency of the 2 Hz passband is continuously 
tuned to track the excitation frequency sweep. The 
outputs from the tracking filters oscillate about zero 
volts and are available in real-time at one of the four 
Ci 5000 analog computers. These filter outputs be- 
come the data signals for input to the hybrid computer 
and are quite clean sinusoids with slowly varying 
frequency (Figures 2 and 3). The one excitation signal 
exhibits approximately constant amplitude but the 
eight response signals exhibit maxima at the aircraft 
structural resonances. 

Data compression 

The next operation is the storing, in real-time in the 
central memory of the CDC 6400 digital computer, the 
times at which all zero-crossings occur, and the peak 
amplitudes of all cycles. The time of a positive- ^oing 
zero-crossing, such as t s in Figure 2 is stored with the 
amplitude of the previous positive peak. A negative- 
going zero-crossing time such as t 4 is stored with the 
amplitude a 8 of the previous negative peak. This data 
compression is possible since the outputs of the tracking 
filters are quite clean sinusoids and more frequent 
sampling would yield redundant data. If needed for 
other applications, further data compression could 
be achieved by discarding some amplitudes and zero- 
crossing times during parts of the frequency sweep 
which contain no structural resonances. The neces- 
sary real-time digital computing could be performed 
if the central processor time allocated to the program 
is greater than the ten percent presently used. 

The envelope detection circuit 

The peak amplitudes are generated by applying each 
data signal to one of the nine envelope detection cir- 
cuits (Figure 4). Each circuit consists of two nearly 
identical circuits; one for the positive side and one 
for the negative side of the data signal. Amplifiers A, 
B, C, D and E in Figure 4 form a positive envelope 
circuit utilizing two mode-controlled integrators (D 
and E) as a track/hold pair, and a first order loop 
(A, B and C) as a maximum-value circuit. Since the 
amplifier A represents a perfect diode, the loop acts 
like a high-gain lag when the input is greater than the 
output of B and the “diode” A is forward biased. When 
the input falls below the output of B, the “diode” 
becomes reverse biased and B is forced to hold at its 
last value. 

As long as the input is positive, comparator output 



Figure 4 — The envelope detection circuit 

U is true, B is in the compute mode, D is tracking B, 
and E is holding the previous peak. As the input goes 
negative, U goes false, B resets to zero, preparing the 
maximum-value circuit for the next positive signal, 
D holds the last voltage from B, and E tracks the new 
peak from D. Thus, with the positive envelope on E 
updating on each negative-going zero-crossing of the 
input, the envelope has staircase-like discontinuities 
as shown in Figure 2, though it is smoother in practice 
when more slowly varying frequencies are used. Figure 
3 is a segment of typical aircraft data. The second 
half of each circuit generates the negative envelope 
in a similar manner (Figure 2). All of these envelope 
voltages are input continuously to analog-to-digital 
converters. A practical upper frequency limit to the 
circuit, using the gains in Figure 4, is 120 Hz. With 
other gains the useful frequency range could be shifted 
so that the upper frequency limit is approximately 
2000 Hz. 

The hybrid interface and data storage 

Each analog console, with its associated interface, 
contains 32 channels of analog-to-digital conversion. 
During real-time, when the hybrid system is sensitive 
to interrupts and peripheral processor patterns (pre- 
compiled I/O programs stored within and executed 
by one of the CDC 6400 peripheral processors), it is 
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possible to transfer nine data words from the sample- 
and-hold amplifiers, via the analog-to-digital converters, 
to central memory in less thaii 300 microseconds. 

The actual mechansim of data storage is initiated 
by leading edge I/O interrupts which cause the pre- 
viously defined patterns (programs) in the peripheral 
processor to transfer the data to a temporary buffer 
in central memory. No central processor time is re- 
quired for this operation. Upon completion of I/O, 
the central processor takes the data from the temporary 
buffer and packs it in an array with four data words 
per central memory word. Here it is stored until the 
frequency sweep is completed and the post real-time 
processing is begun. The leading edge I/O interrupt 
is activated asynchronously by a positive-going zero- 
crossing of one of the data channels. This causes the 
instantaneous digital value of the corresponding posi- 
tive envelope (from the analog-to-digital converter) 
to be stored together with the time at which the zero- 
crossing occurs. Similarly, negative-going zero-crossings 
set interrupts which initiate the storage of negative- 
going zero-crossing times and corresponding negative 
envelope amplitudes. 

During the real-time phase, when timing is most 
critical, the requirements on the central processor 
are reduced to that of transferring and packing data 
within central memory. This requires approximately 
ten percent of the central processor for nine channels 
at signal frequencies of 30 Hz. Thus the central processor 
is readily available to service other hybrid programs or 
batch digital programs as required. 

Using central memory only, data for up to 32,000 
signal cycles can be stored. It might be possible to 
increase this number greatly by using the disk file 
or magnetic tape. However, the Lockheed-Georgia 
Company's hybrid computing facility is a time-shared 
system and it was necessary to program this problem 
for time-sharing compatibility. The system contains 
a CDC 6638 disk and four CDC 607 tape drives, and 
is strongly file oriented, using the disk for intermediate 
file storage during input and output. In a time-critical 
problem such as flight flutter testing, the disk might 
not be available for mass data storage, since it can be 
in use on a non-interruptable channel for several 
seconds under certain I/O conditions. When real-time 
mass data storage is required, the 607 tape drives are 
generally used. These drives can be assigned to a 
specific problem and can normally be accessed within 
500 milliseconds. However, it is necessary to use a 
central memory buffer capable of storing approximately 
one second duration of data. 

If the restrictions of time-sharing are removed and 
if the system can be dedicated to flutter testing, the 


disk, as well as the tapes, becomes available for real- 
time data storage. The size of the program could be 
increased even further by using multiple analog con- 
soles and interfaces to achieve parallel data conversion 
and transmission. 

The hybrid time measurement circuit 

A problem arose in the accurate measurement (within 
ten microseconds) of the time intervals between the 
interrupts on as many as nine channels. 

The problem was complicated by the possibility 
of virtually simultaneous zero-crossings. Although 
the digital computer includes a Precision Interval 
Generator, which downcounts at a rate of 500 KHz, 
the attempts to use it for accurate timing of events, 
external to the digital computer, were unsuccessful. 
This was mainly because of the difficulty of handling 
simultaneous interrupts and the effect of data-link 
delays (including software delays and delays, which 
could be of the order of a millisecond, arising when 
the computer must finish a previously initiated or 
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Figure 5 — The time measurement circuit 
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higher priority task before attending to the next). 

These difficulties were avoided by developing a new 
hybrid technique to take advantage of the sample/ 
hold feature of the analog-to-digital converters (ADC). 
This time measurement circuit is described with refer- 
ence to Figure 5. The basis is the generation of a time- 
synchronized voltage waveform which represents the 
fine count and is fed into one ADC for each of the 
data channels to be monitored. The waveform selected 
is a 0 — 100 — 0 volt triangular wave with a 20 milli- 
second period. It is generated by two complementary 
integrators controlled by the analog clock which counts 
down frequencies from a mega-Hertz crystal oscillator. 
Because one integrator is always in reset while the 
other is integrating, the synchronization of the output 
at zero volts is assured at the beginning of each new 
cycle. 

By connecting each holding register (which goes 
true when an interrupt is set and remains true until 
the central processor begins action on the interrupt) 
to the sample/hold controller of the corresponding 
ADC, this fine count voltage can be held until the 
digital computer can read it. The coarse count is 
purely digital and is incremented at the beginning of 
each cycle by a subroutine which is called from the 
highest priority interrupt. A logical signal is required 
if the coarse count is incremented during the time 
between holding and reading an ADC. This signal 
is obtained by “ANDing” the holding registers of 
the coarse count and the zero-crossing interrupts. By 
using this signal to set a flip-flop which feeds a discrete 
control line, and by reading this line at the same time 
as the digital computer reads the ADC, the coarse 
count portion of the stored time may be decremented 
if necessary. 

The triangular wave is an easily synchronized signal 
with no discontinuities. For timing purposes, however, 
it is necessary to know whether the wave is ramping 
up or down at the time of reading. This is achieved by 
using, for each channel, a flip-flop tied into a discrete 
line. The flop-flop normally tracks the analog clock, 
but maintains the present state when the holding 
register indicates that the interrupt is in progress. 
A zero-crossing then triggers an interrupt which simul- 
taneously initiates the digital computer, holds the 
ADC and the ramp up/down flip-flop, and actuates the 
gate of the coarse count warning flip-flop. After recog- 
nizing the interrupt, the digital computer simply reads 
the ADC and the two discrete lines and stores their 
values together with the coarse count. Further action 
may be postponed until the post-real-time phase. 

With a ten volt per millisecond excursion of the 
analog triangular wave, the ADC's are able to resolve 


TABLE I — Successive zero crossing times for five 
channels interrupted simultaneously at 5 
Hz. 
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5,969986 

5,969988 

6. 169990 

6.169991 

6,169991 

6.169990 

6.169991 

6.360991 

6.369902 

6.369991 

6.369991 

6.369992 

6.669988 

6.569968 

6,569988 

6,569987 

6.569989 

6.769969 

6,769940 

6,769990 

6.769989 

6.769991 

6.969992 

6,969902 

6.969991 

6,969991 

6,969993 

7.169989 

7.169969 

7.169989 

7.169989 

7.169989 

7.369989 

7,369900 

7.369989 

7,369989 

7,369901 

'7,569994 

7,569994 

7.569993 

7,569994 

7.569995 

7.769989 

7.769900 

7.769989 

7,769989 

7,769991 

7.960989 

7.969969 

7.069989 

7,969988 

7,969989 

8,169992 

8.169992 

8.169991 

8,169991 

8.169993 

8.369990 

8,369991 

8,369991 

8,369990 

8.369991 

8,669968 

6.569989 

8.569980 

8,56998 8 

8.569989 

8,769991 

8.769991 

8,769991 

8,769991 

8.769992 


the voltages within 0.1 volts. This is equivalent to a 
timing accuracy of ten microseconds. Better accuracy 
could be achieved by balancing the integrators and 
the ADC's for off-set and drift. Typical results from 
a system without special balancing are presented in 
Tables I and II. Identical channels were interrupted 
simultaneously by an analog clock. Table I contains 
interrupt times (zero-crossing times) for successive 
interrupts at 5 Hz while Table II contains similar re- 
sults for interrupts at 100 Hz. The times of simul- 
taneous events on all channels differ by no more than 
three microseconds. Also the periods between succes- 
sive interrupts on any one channel differ by no more 
than three microseconds. At zero-crossing frequencies 
as high as 100 Hz, the nine channels of the flight 
flutter program can be sampled with this same accuracy. 
It is possible to read all 32 ADC channels at each ana- 
log console within one millisecond so that, at a sampling 
rate of 100 Hz, each interface channel is idle 90 
percent of the time. Thus the data frequency or the 
number of channels could be increased significantly 
without loss of accuracy. 

Post real-time processing 

With a maximum lag of only a few cycles after 
occurrences on the aircraft, a digital description of 
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TABLE II — Successive zero crossing times for five 
channels interrupted simultaneously at 
100 Hz. 


ZERO CROSSING TIMES in seconds 
CHANNEL l CHANNEL 2 CHANNEL 3 CHANNEL 4 CHANNEL 5 


.179992 

,189994 

,199992 

,209994 

,219992 

,229994 

.239993 

,249994 

,259993 

,26999* 

.279993 
,289994 
, 299993 
,309994 
.319991 
,329994 
.33099? 
,349994 
, 159993 
.369994 

.379993 
. 3 89994 
,19999 1 
,409994 
.41999? 
,4?9994 
,439993 
.449994 
,45999? 
,4*59994 
,47999? 
,489994 

,499993 

,509^94 

. 51999 ? 

,529995 

,510993 

,549995 

, 55099 ? 

,560995 


.179992 
,189994 
.199991 
,209994 
.219992 
,229994 
,2399*2 
,249994 
,259992 
,269994 
,27999? 
,289994 
,29999? 
.309994 
. <1999? 
.329994 
.339991 
, 349994 
.3599''? 

, 369994 
.379991 
, 3H99w4 
, 19999? 
.409993 

.419990 

.429994 

.439991 
.449994 
.459091 
.469994 
.479991 
.489994 
.499991 
.509994 
.51 99-4 V 
.629994 
.63999? 
,649994 
.659992 
.569904 


, 179992 

.189994 

.199992 

,209994 

.219992 

,229994 

,239992 

,249994 

.269992 

.269994 

,27999? 

,289994 

,299992 

,309994 

,31999? 

.329994 

.339992 

, 149994 

. <59993 

.369994 

,379992 

. 189994 

,399992 

,409994 

.419991 

,429994 

.439992 

.449994 

.459992 

,469994 

.479992 

,489994 

.499992 

,609994 

.519992 

.629994 

,539992 

,649994 

.659991 

,669994 


, 1 79993 

.189994 

.199992 

.20999* 

.219993 

.229995 

.239993 

.249994 

,259993 

,269995 

,279992 

,289994 

,299992 

,309995 

,319993 

,329996 

,339992 

,349996 

,359994 

,369996 

.379992 

.389995 

,399992 

,409994 

.419992 

.429994 

.439992 

.449995 

.459992 

,469996 

,479992 

.489994 

,499992 

, 509995 

,619993 

,529994 

.539993 

.649995 

.559992 

,56999" 


.179991 

.189994 

, 199992 

.209994 

.219991 

.229994 

,239991 

,249994 

,259992 

.269995 

,27999? 

,289994 

,299991 

,309994 

.319992 

. 329994 

.339992 

.349994 

,35999? 

.369994 

,379992 

,389994 

.399992 

,409994 

,419991 

.429993 

.43999? 

.449994 

.459991 

.469994 

.4 7999? 

,489994 

.409992 

,509994 

,51999? 

,529994 

,53999? 

,649994 

,659993 

,569994 


nine signals can be stored in the CDC 6400. The com- 
puter can be ordered to start or stop accepting real- 
time data either at the console or by remote switches 
in the flight test monitoring room. After a stop order 
the stored data is immediately processed. Eoth signal 
zero-crossing times and amplitudes undergo conven- 
tional digital smoothing. Next an amplitude-versus- 
frequency history is generated for all nine signals 
from their peak amplitude values and the time intervals 
between zero-crossings of the excitation signal. A 
phase-versus-frequency history of each response signal, 
relative to the excitation, follows by comparing zero- 
crossing times in each response with those in the ex- 
citation. Thus the hybrid computer is used as a fre- 
quency response (or transfer function) analyzer. By 
searching through the values of the response envelopes, 
it is able to find the resonances, calculate their fre- 
quencies, and normalize their amplitudes by the cor- 
responding amplitudes of the excitation. An increase 
with airspeed of the normalized amplitude of a reso- 
nance can indicate a decrease in its damping and in 


this way the aircraft stability trends can be followed. 
If the frequency sweep is sufficiently slow and if the 
actual forcing of the aircraft is accurately represented 
by the excitation signal, the computer can use the 
phase information and the technique of Kennedy and 
Pancu 4 to separate closely coupled resonances arid 
calculate their damping. Because these post real-time 
operations involve conventional digital programming, 
details are irrelevant in this presentation. 

Typically, the answers, from eight response signals 
and a sweep from 1 to 30 Hz lasting 120 seconds, 
begin to appear on the line printer approximately 
three seconds after the end of the sweep. For its ver- 
satility a facsimile machine is used to transmit copies 
of the line printer output at eight pages per minute 
to its remote terminal in the flight test monitoring 
room. A remote line printer or display scope could 
quite easily have been used. 

SUMMARY 

The hybrid frequency response technique has made 
possible very rapid data reduction during aircraft 
flight flutter testing when time saving is extremely 
important. Previously, such data reduction has been 
performed in post real-time, to a large extent by hand, 
from chart recordings. The savings in aircraft flight 
time, and the increased number of channels v/hich can 
be analyzed, fully justify the use of a large computer. 
It is worth comparing this hybrid system with other 
systems which were considered. 

A different approach could be based on the sampling 
of the data signals at such a high frequency that peak 
amplitudes and zero-crossing times could be detected 
digitally, in post real-time, by interpolation between 
the samples. "When the signals are quite clean sinusoids 
of slowly varying frequency, this method leads to much 
redundant data and a large storage requirement. 
Furthermore, it was found that the use of nine data 
signals and frequencies up to 30 Hz requires that the 
computer accept a prolonged data input rate far 
greater than its capability. 

Some data compression can be achieved by the use 
of the Fast Fourier Transform 6 which requires a mini- 
mum sampling rate of at least twice the highest fre- 
quency of interest. 6 Thus, a sweep from 1 to 30 Hz 
with 120 seconds duration requires at least 7,200 
samples per channel. This is approximately twice 
the number taken by the hybrid technique., On the 
CDC 6400 a Fast Fourier Transform of 8192 samples 
takes approximately sixty seconds per channel using 
software. This is prohibitively long for flight flutter 
testing when compared with the three seconds for 
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nine channels taken by the hybrid technique. A Fast 
Fourier Transform using hardware would be much 
faster but such a unit was not available. Transform 
techniques are more applicable to transient and 
random signals than to slow frequency sweeps. 

Separate commercial frequency response analyzers 
for each data channel could be interfaced with a digital 
computer through analog-to-digital converters but 
it is extremely difficult to justify the purchase of a 
number of such units when a very large hybrid com- 
puter is available. Certainly a digital computer is 
needed to perform the many logical operations which 
separate the important resonances and discard less 
important ones. To obtain numerical values of 
damping, digital operations appear necessary. The 
hybrid computer has the. additional advantage of 
making possible many convenient forms of system 
control and display to further aid in saving aircraft 
flight time. 

This data reduction system was developed for use 
in a flight flutter test program but it should be adapt- 
able to other situations calling for very fast reduction 
of slow frequency sweeps. The present application 
requires only one analog console and about 25 percent 
of the available central memory but it could be ex- 
panded to use all four analog consoles and interfaces 
to give a capability for 40 data channels. Of course, 
this would dedicate the system. Some elements, such 
as the hybrid time measurement circuit, could find 
even wider application. 
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